mirror of
git://git.sv.gnu.org/emacs.git
synced 2025-12-11 00:30:17 -08:00
Merge remote-tracking branch 'origin/master' into feature/android
This commit is contained in:
commit
8f3fee7dff
8 changed files with 167 additions and 61 deletions
|
|
@ -950,8 +950,8 @@ features used mainly in Lisp programs.
|
||||||
@dfn{special constructs} and the rest are @dfn{ordinary}. An ordinary
|
@dfn{special constructs} and the rest are @dfn{ordinary}. An ordinary
|
||||||
character matches that same character and nothing else. The special
|
character matches that same character and nothing else. The special
|
||||||
characters are @samp{$^.*+?[\}. The character @samp{]} is special if
|
characters are @samp{$^.*+?[\}. The character @samp{]} is special if
|
||||||
it ends a character alternative (see below). The character @samp{-}
|
it ends a bracket expression (see below). The character @samp{-}
|
||||||
is special inside a character alternative. Any other character
|
is special inside a bracket expression. Any other character
|
||||||
appearing in a regular expression is ordinary, unless a @samp{\}
|
appearing in a regular expression is ordinary, unless a @samp{\}
|
||||||
precedes it. (When you use regular expressions in a Lisp program,
|
precedes it. (When you use regular expressions in a Lisp program,
|
||||||
each @samp{\} must be doubled, see the example near the end of this
|
each @samp{\} must be doubled, see the example near the end of this
|
||||||
|
|
@ -1033,11 +1033,11 @@ you search for @samp{a.*?$} against the text @samp{abbab} followed by
|
||||||
a newline, it matches the whole string. Since it @emph{can} match
|
a newline, it matches the whole string. Since it @emph{can} match
|
||||||
starting at the first @samp{a}, it does.
|
starting at the first @samp{a}, it does.
|
||||||
|
|
||||||
|
@cindex bracket expression
|
||||||
@cindex set of alternative characters, in regular expressions
|
@cindex set of alternative characters, in regular expressions
|
||||||
@cindex character set, in regular expressions
|
@cindex character set, in regular expressions
|
||||||
@item @kbd{[ @dots{} ]}
|
@item @kbd{[ @dots{} ]}
|
||||||
is a @dfn{set of alternative characters}, or a @dfn{character set},
|
is a @dfn{bracket expression}, which matches one of a set of characters.
|
||||||
beginning with @samp{[} and terminated by @samp{]}.
|
|
||||||
|
|
||||||
In the simplest case, the characters between the two brackets are what
|
In the simplest case, the characters between the two brackets are what
|
||||||
this set can match. Thus, @samp{[ad]} matches either one @samp{a} or
|
this set can match. Thus, @samp{[ad]} matches either one @samp{a} or
|
||||||
|
|
@ -1057,7 +1057,7 @@ Greek letters.
|
||||||
@cindex character classes, in regular expressions
|
@cindex character classes, in regular expressions
|
||||||
You can also include certain special @dfn{character classes} in a
|
You can also include certain special @dfn{character classes} in a
|
||||||
character set. A @samp{[:} and balancing @samp{:]} enclose a
|
character set. A @samp{[:} and balancing @samp{:]} enclose a
|
||||||
character class inside a set of alternative characters. For instance,
|
character class inside a bracket expression. For instance,
|
||||||
@samp{[[:alnum:]]} matches any letter or digit. @xref{Char Classes,,,
|
@samp{[[:alnum:]]} matches any letter or digit. @xref{Char Classes,,,
|
||||||
elisp, The Emacs Lisp Reference Manual}, for a list of character
|
elisp, The Emacs Lisp Reference Manual}, for a list of character
|
||||||
classes.
|
classes.
|
||||||
|
|
@ -1125,7 +1125,7 @@ no preceding expression on which the @samp{*} can act. It is poor practice
|
||||||
to depend on this behavior; it is better to quote the special character anyway,
|
to depend on this behavior; it is better to quote the special character anyway,
|
||||||
regardless of where it appears.
|
regardless of where it appears.
|
||||||
|
|
||||||
As a @samp{\} is not special inside a set of alternative characters, it can
|
As a @samp{\} is not special inside a bracket expression, it can
|
||||||
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
||||||
You should not quote these characters when they have no special
|
You should not quote these characters when they have no special
|
||||||
meaning. This would not clarify anything, since backslashes
|
meaning. This would not clarify anything, since backslashes
|
||||||
|
|
|
||||||
|
|
@ -18,11 +18,12 @@ portions of it.
|
||||||
* Searching and Case:: Case-independent or case-significant searching.
|
* Searching and Case:: Case-independent or case-significant searching.
|
||||||
* Regular Expressions:: Describing classes of strings.
|
* Regular Expressions:: Describing classes of strings.
|
||||||
* Regexp Search:: Searching for a match for a regexp.
|
* Regexp Search:: Searching for a match for a regexp.
|
||||||
* POSIX Regexps:: Searching POSIX-style for the longest match.
|
* Longest Match:: Searching for the longest match.
|
||||||
* Match Data:: Finding out which part of the text matched,
|
* Match Data:: Finding out which part of the text matched,
|
||||||
after a string or regexp search.
|
after a string or regexp search.
|
||||||
* Search and Replace:: Commands that loop, searching and replacing.
|
* Search and Replace:: Commands that loop, searching and replacing.
|
||||||
* Standard Regexps:: Useful regexps for finding sentences, pages,...
|
* Standard Regexps:: Useful regexps for finding sentences, pages,...
|
||||||
|
* POSIX Regexps:: Emacs regexps vs POSIX regexps.
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
The @samp{skip-chars@dots{}} functions also perform a kind of searching.
|
The @samp{skip-chars@dots{}} functions also perform a kind of searching.
|
||||||
|
|
@ -277,10 +278,10 @@ character is a simple regular expression that matches that character
|
||||||
and nothing else. The special characters are @samp{.}, @samp{*},
|
and nothing else. The special characters are @samp{.}, @samp{*},
|
||||||
@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
|
@samp{+}, @samp{?}, @samp{[}, @samp{^}, @samp{$}, and @samp{\}; no new
|
||||||
special characters will be defined in the future. The character
|
special characters will be defined in the future. The character
|
||||||
@samp{]} is special if it ends a character alternative (see later).
|
@samp{]} is special if it ends a bracket expression (see later).
|
||||||
The character @samp{-} is special inside a character alternative. A
|
The character @samp{-} is special inside a bracket expression. A
|
||||||
@samp{[:} and balancing @samp{:]} enclose a character class inside a
|
@samp{[:} and balancing @samp{:]} enclose a character class inside a
|
||||||
character alternative. Any other character appearing in a regular
|
bracket expression. Any other character appearing in a regular
|
||||||
expression is ordinary, unless a @samp{\} precedes it.
|
expression is ordinary, unless a @samp{\} precedes it.
|
||||||
|
|
||||||
For example, @samp{f} is not a special character, so it is ordinary, and
|
For example, @samp{f} is not a special character, so it is ordinary, and
|
||||||
|
|
@ -373,19 +374,19 @@ expression @samp{c[ad]*?a}, applied to that same string, matches just
|
||||||
permits the whole expression to match is @samp{d}.)
|
permits the whole expression to match is @samp{d}.)
|
||||||
|
|
||||||
@item @samp{[ @dots{} ]}
|
@item @samp{[ @dots{} ]}
|
||||||
@cindex character alternative (in regexp)
|
@cindex bracket expression (in regexp)
|
||||||
@cindex @samp{[} in regexp
|
@cindex @samp{[} in regexp
|
||||||
@cindex @samp{]} in regexp
|
@cindex @samp{]} in regexp
|
||||||
is a @dfn{character alternative}, which begins with @samp{[} and is
|
is a @dfn{bracket expression}, which begins with @samp{[} and is
|
||||||
terminated by @samp{]}. In the simplest case, the characters between
|
terminated by @samp{]}. In the simplest case, the characters between
|
||||||
the two brackets are what this character alternative can match.
|
the two brackets are what this bracket expression can match.
|
||||||
|
|
||||||
Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
|
Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and
|
||||||
@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
|
@samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s
|
||||||
(including the empty string). It follows that @samp{c[ad]*r}
|
(including the empty string). It follows that @samp{c[ad]*r}
|
||||||
matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
|
matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.
|
||||||
|
|
||||||
You can also include character ranges in a character alternative, by
|
You can also include character ranges in a bracket expression, by
|
||||||
writing the starting and ending characters with a @samp{-} between them.
|
writing the starting and ending characters with a @samp{-} between them.
|
||||||
Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
|
Thus, @samp{[a-z]} matches any lower-case @acronym{ASCII} letter.
|
||||||
Ranges may be intermixed freely with individual characters, as in
|
Ranges may be intermixed freely with individual characters, as in
|
||||||
|
|
@ -394,7 +395,7 @@ or @samp{$}, @samp{%} or period. However, the ending character of one
|
||||||
range should not be the starting point of another one; for example,
|
range should not be the starting point of another one; for example,
|
||||||
@samp{[a-m-z]} should be avoided.
|
@samp{[a-m-z]} should be avoided.
|
||||||
|
|
||||||
A character alternative can also specify named character classes
|
A bracket expression can also specify named character classes
|
||||||
(@pxref{Char Classes}). For example, @samp{[[:ascii:]]} matches any
|
(@pxref{Char Classes}). For example, @samp{[[:ascii:]]} matches any
|
||||||
@acronym{ASCII} character. Using a character class is equivalent to
|
@acronym{ASCII} character. Using a character class is equivalent to
|
||||||
mentioning each of the characters in that class; but the latter is not
|
mentioning each of the characters in that class; but the latter is not
|
||||||
|
|
@ -403,9 +404,9 @@ different characters. A character class should not appear as the
|
||||||
lower or upper bound of a range.
|
lower or upper bound of a range.
|
||||||
|
|
||||||
The usual regexp special characters are not special inside a
|
The usual regexp special characters are not special inside a
|
||||||
character alternative. A completely different set of characters is
|
bracket expression. A completely different set of characters is
|
||||||
special: @samp{]}, @samp{-} and @samp{^}.
|
special: @samp{]}, @samp{-} and @samp{^}.
|
||||||
To include @samp{]} in a character alternative, put it at the
|
To include @samp{]} in a bracket expression, put it at the
|
||||||
beginning. To include @samp{^}, put it anywhere but at the beginning.
|
beginning. To include @samp{^}, put it anywhere but at the beginning.
|
||||||
To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches
|
To include @samp{-}, put it at the end. Thus, @samp{[]^-]} matches
|
||||||
all three of these special characters. You cannot use @samp{\} to
|
all three of these special characters. You cannot use @samp{\} to
|
||||||
|
|
@ -443,7 +444,7 @@ characters and raw 8-bit bytes, but not non-ASCII characters. This
|
||||||
feature is intended for searching text in unibyte buffers and strings.
|
feature is intended for searching text in unibyte buffers and strings.
|
||||||
@end enumerate
|
@end enumerate
|
||||||
|
|
||||||
Some kinds of character alternatives are not the best style even
|
Some kinds of bracket expressions are not the best style even
|
||||||
though they have a well-defined meaning in Emacs. They include:
|
though they have a well-defined meaning in Emacs. They include:
|
||||||
|
|
||||||
@enumerate
|
@enumerate
|
||||||
|
|
@ -457,7 +458,7 @@ Unicode character escapes can help here; for example, for most programmers
|
||||||
@samp{[ก-ฺ฿-๛]} is less clear than @samp{[\u0E01-\u0E3A\u0E3F-\u0E5B]}.
|
@samp{[ก-ฺ฿-๛]} is less clear than @samp{[\u0E01-\u0E3A\u0E3F-\u0E5B]}.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Although a character alternative can include duplicates, it is better
|
Although a bracket expression can include duplicates, it is better
|
||||||
style to avoid them. For example, @samp{[XYa-yYb-zX]} is less clear
|
style to avoid them. For example, @samp{[XYa-yYb-zX]} is less clear
|
||||||
than @samp{[XYa-z]}.
|
than @samp{[XYa-z]}.
|
||||||
|
|
||||||
|
|
@ -468,30 +469,30 @@ is simpler to list the characters. For example,
|
||||||
than @samp{[ij]}, and @samp{[i-k]} is less clear than @samp{[ijk]}.
|
than @samp{[ij]}, and @samp{[i-k]} is less clear than @samp{[ijk]}.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Although a @samp{-} can appear at the beginning of a character
|
Although a @samp{-} can appear at the beginning of a bracket
|
||||||
alternative or as the upper bound of a range, it is better style to
|
expression or as the upper bound of a range, it is better style to
|
||||||
put @samp{-} by itself at the end of a character alternative. For
|
put @samp{-} by itself at the end of a bracket expression. For
|
||||||
example, although @samp{[-a-z]} is valid, @samp{[a-z-]} is better
|
example, although @samp{[-a-z]} is valid, @samp{[a-z-]} is better
|
||||||
style; and although @samp{[*--]} is valid, @samp{[*+,-]} is clearer.
|
style; and although @samp{[*--]} is valid, @samp{[*+,-]} is clearer.
|
||||||
@end enumerate
|
@end enumerate
|
||||||
|
|
||||||
@item @samp{[^ @dots{} ]}
|
@item @samp{[^ @dots{} ]}
|
||||||
@cindex @samp{^} in regexp
|
@cindex @samp{^} in regexp
|
||||||
@samp{[^} begins a @dfn{complemented character alternative}. This
|
@samp{[^} begins a @dfn{complemented bracket expression}. This
|
||||||
matches any character except the ones specified. Thus,
|
matches any character except the ones specified. Thus,
|
||||||
@samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and
|
@samp{[^a-z0-9A-Z]} matches all characters @emph{except} ASCII letters and
|
||||||
digits.
|
digits.
|
||||||
|
|
||||||
@samp{^} is not special in a character alternative unless it is the first
|
@samp{^} is not special in a bracket expression unless it is the first
|
||||||
character. The character following the @samp{^} is treated as if it
|
character. The character following the @samp{^} is treated as if it
|
||||||
were first (in other words, @samp{-} and @samp{]} are not special there).
|
were first (in other words, @samp{-} and @samp{]} are not special there).
|
||||||
|
|
||||||
A complemented character alternative can match a newline, unless newline is
|
A complemented bracket expression can match a newline, unless newline is
|
||||||
mentioned as one of the characters not to match. This is in contrast to
|
mentioned as one of the characters not to match. This is in contrast to
|
||||||
the handling of regexps in programs such as @code{grep}.
|
the handling of regexps in programs such as @code{grep}.
|
||||||
|
|
||||||
You can specify named character classes, just like in character
|
You can specify named character classes, just like in bracket
|
||||||
alternatives. For instance, @samp{[^[:ascii:]]} matches any
|
expressions. For instance, @samp{[^[:ascii:]]} matches any
|
||||||
non-@acronym{ASCII} character. @xref{Char Classes}.
|
non-@acronym{ASCII} character. @xref{Char Classes}.
|
||||||
|
|
||||||
@item @samp{^}
|
@item @samp{^}
|
||||||
|
|
@ -505,9 +506,10 @@ beginning of a line.
|
||||||
When matching a string instead of a buffer, @samp{^} matches at the
|
When matching a string instead of a buffer, @samp{^} matches at the
|
||||||
beginning of the string or after a newline character.
|
beginning of the string or after a newline character.
|
||||||
|
|
||||||
For historical compatibility reasons, @samp{^} can be used only at the
|
For historical compatibility, @samp{^} is special only at the beginning
|
||||||
beginning of the regular expression, or after @samp{\(}, @samp{\(?:}
|
of the regular expression, or after @samp{\(}, @samp{\(?:} or @samp{\|}.
|
||||||
or @samp{\|}.
|
Although @samp{^} is an ordinary character in other contexts,
|
||||||
|
it is good practice to use @samp{\^} even then.
|
||||||
|
|
||||||
@item @samp{$}
|
@item @samp{$}
|
||||||
@cindex @samp{$} in regexp
|
@cindex @samp{$} in regexp
|
||||||
|
|
@ -519,8 +521,10 @@ matches a string of one @samp{x} or more at the end of a line.
|
||||||
When matching a string instead of a buffer, @samp{$} matches at the end
|
When matching a string instead of a buffer, @samp{$} matches at the end
|
||||||
of the string or before a newline character.
|
of the string or before a newline character.
|
||||||
|
|
||||||
For historical compatibility reasons, @samp{$} can be used only at the
|
For historical compatibility, @samp{$} is special only at the
|
||||||
end of the regular expression, or before @samp{\)} or @samp{\|}.
|
end of the regular expression, or before @samp{\)} or @samp{\|}.
|
||||||
|
Although @samp{$} is an ordinary character in other contexts,
|
||||||
|
it is good practice to use @samp{\$} even then.
|
||||||
|
|
||||||
@item @samp{\}
|
@item @samp{\}
|
||||||
@cindex @samp{\} in regexp
|
@cindex @samp{\} in regexp
|
||||||
|
|
@ -540,14 +544,19 @@ example, the regular expression that matches the @samp{\} character is
|
||||||
@samp{\} is @code{"\\\\"}.
|
@samp{\} is @code{"\\\\"}.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
@strong{Please note:} For historical compatibility, special characters
|
For historical compatibility, a repetition operator is treated as ordinary
|
||||||
are treated as ordinary ones if they are in contexts where their special
|
if it appears at the start of a regular expression
|
||||||
meanings make no sense. For example, @samp{*foo} treats @samp{*} as
|
or after @samp{^}, @samp{\(}, @samp{\(?:} or @samp{\|}.
|
||||||
ordinary since there is no preceding expression on which the @samp{*}
|
For example, @samp{*foo} is treated as @samp{\*foo}, and
|
||||||
can act. It is poor practice to depend on this behavior; quote the
|
@samp{two\|^\@{2\@}} is treated as @samp{two\|^@{2@}}.
|
||||||
special character anyway, regardless of where it appears.
|
It is poor practice to depend on this behavior; use proper backslash
|
||||||
|
escaping anyway, regardless of where the repetition operator appears.
|
||||||
|
Also, a repetition operator should not immediately follow a backslash escape
|
||||||
|
that matches only empty strings, as Emacs has bugs in this area.
|
||||||
|
For example, it is unwise to use @samp{\b*}, which can be omitted
|
||||||
|
without changing the documented meaning of the regular expression.
|
||||||
|
|
||||||
As a @samp{\} is not special inside a character alternative, it can
|
As a @samp{\} is not special inside a bracket expression, it can
|
||||||
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
never remove the special meaning of @samp{-}, @samp{^} or @samp{]}.
|
||||||
You should not quote these characters when they have no special
|
You should not quote these characters when they have no special
|
||||||
meaning. This would not clarify anything, since backslashes
|
meaning. This would not clarify anything, since backslashes
|
||||||
|
|
@ -556,23 +565,23 @@ special meaning, as in @samp{[^\]} (@code{"[^\\]"} for Lisp string
|
||||||
syntax), which matches any single character except a backslash.
|
syntax), which matches any single character except a backslash.
|
||||||
|
|
||||||
In practice, most @samp{]} that occur in regular expressions close a
|
In practice, most @samp{]} that occur in regular expressions close a
|
||||||
character alternative and hence are special. However, occasionally a
|
bracket expression and hence are special. However, occasionally a
|
||||||
regular expression may try to match a complex pattern of literal
|
regular expression may try to match a complex pattern of literal
|
||||||
@samp{[} and @samp{]}. In such situations, it sometimes may be
|
@samp{[} and @samp{]}. In such situations, it sometimes may be
|
||||||
necessary to carefully parse the regexp from the start to determine
|
necessary to carefully parse the regexp from the start to determine
|
||||||
which square brackets enclose a character alternative. For example,
|
which square brackets enclose a bracket expression. For example,
|
||||||
@samp{[^][]]} consists of the complemented character alternative
|
@samp{[^][]]} consists of the complemented bracket expression
|
||||||
@samp{[^][]} (which matches any single character that is not a square
|
@samp{[^][]} (which matches any single character that is not a square
|
||||||
bracket), followed by a literal @samp{]}.
|
bracket), followed by a literal @samp{]}.
|
||||||
|
|
||||||
The exact rules are that at the beginning of a regexp, @samp{[} is
|
The exact rules are that at the beginning of a regexp, @samp{[} is
|
||||||
special and @samp{]} not. This lasts until the first unquoted
|
special and @samp{]} not. This lasts until the first unquoted
|
||||||
@samp{[}, after which we are in a character alternative; @samp{[} is
|
@samp{[}, after which we are in a bracket expression; @samp{[} is
|
||||||
no longer special (except when it starts a character class) but @samp{]}
|
no longer special (except when it starts a character class) but @samp{]}
|
||||||
is special, unless it immediately follows the special @samp{[} or that
|
is special, unless it immediately follows the special @samp{[} or that
|
||||||
@samp{[} followed by a @samp{^}. This lasts until the next special
|
@samp{[} followed by a @samp{^}. This lasts until the next special
|
||||||
@samp{]} that does not end a character class. This ends the character
|
@samp{]} that does not end a character class. This ends the bracket
|
||||||
alternative and restores the ordinary syntax of regular expressions;
|
expression and restores the ordinary syntax of regular expressions;
|
||||||
an unquoted @samp{[} is special again and a @samp{]} not.
|
an unquoted @samp{[} is special again and a @samp{]} not.
|
||||||
|
|
||||||
@node Char Classes
|
@node Char Classes
|
||||||
|
|
@ -583,8 +592,8 @@ an unquoted @samp{[} is special again and a @samp{]} not.
|
||||||
@cindex alpha character class, regexp
|
@cindex alpha character class, regexp
|
||||||
@cindex xdigit character class, regexp
|
@cindex xdigit character class, regexp
|
||||||
|
|
||||||
Below is a table of the classes you can use in a character
|
Below is a table of the classes you can use in a bracket
|
||||||
alternative, and what they mean. Note that the @samp{[} and @samp{]}
|
expression, and what they mean. Note that the @samp{[} and @samp{]}
|
||||||
characters that enclose the class name are part of the name, so a
|
characters that enclose the class name are part of the name, so a
|
||||||
regular expression using these classes needs one more pair of
|
regular expression using these classes needs one more pair of
|
||||||
brackets. For example, a regular expression matching a sequence of
|
brackets. For example, a regular expression matching a sequence of
|
||||||
|
|
@ -911,7 +920,7 @@ with a symbol-constituent character.
|
||||||
|
|
||||||
@kindex invalid-regexp
|
@kindex invalid-regexp
|
||||||
Not every string is a valid regular expression. For example, a string
|
Not every string is a valid regular expression. For example, a string
|
||||||
that ends inside a character alternative without a terminating @samp{]}
|
that ends inside a bracket expression without a terminating @samp{]}
|
||||||
is invalid, and so is a string that ends with a single @samp{\}. If
|
is invalid, and so is a string that ends with a single @samp{\}. If
|
||||||
an invalid regular expression is passed to any of the search functions,
|
an invalid regular expression is passed to any of the search functions,
|
||||||
an @code{invalid-regexp} error is signaled.
|
an @code{invalid-regexp} error is signaled.
|
||||||
|
|
@ -948,7 +957,7 @@ deciphered as follows:
|
||||||
|
|
||||||
@table @code
|
@table @code
|
||||||
@item [.?!]
|
@item [.?!]
|
||||||
The first part of the pattern is a character alternative that matches
|
The first part of the pattern is a bracket expression that matches
|
||||||
any one of three characters: period, question mark, and exclamation
|
any one of three characters: period, question mark, and exclamation
|
||||||
mark. The match must begin with one of these three characters. (This
|
mark. The match must begin with one of these three characters. (This
|
||||||
is one point where the new default regexp used by Emacs differs from
|
is one point where the new default regexp used by Emacs differs from
|
||||||
|
|
@ -960,7 +969,7 @@ The second part of the pattern matches any closing braces and quotation
|
||||||
marks, zero or more of them, that may follow the period, question mark
|
marks, zero or more of them, that may follow the period, question mark
|
||||||
or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
|
or exclamation mark. The @code{\"} is Lisp syntax for a double-quote in
|
||||||
a string. The @samp{*} at the end indicates that the immediately
|
a string. The @samp{*} at the end indicates that the immediately
|
||||||
preceding regular expression (a character alternative, in this case) may be
|
preceding regular expression (a bracket expression, in this case) may be
|
||||||
repeated zero or more times.
|
repeated zero or more times.
|
||||||
|
|
||||||
@item \\($\\|@ $\\|\t\\|@ @ \\)
|
@item \\($\\|@ $\\|\t\\|@ @ \\)
|
||||||
|
|
@ -1911,7 +1920,7 @@ attempts. Other zero-width assertions may also bring benefits by
|
||||||
causing a match to fail early.
|
causing a match to fail early.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
Avoid or-patterns in favor of character alternatives: write
|
Avoid or-patterns in favor of bracket expressions: write
|
||||||
@samp{[ab]} instead of @samp{a\|b}. Recall that @samp{\s-} and @samp{\sw}
|
@samp{[ab]} instead of @samp{a\|b}. Recall that @samp{\s-} and @samp{\sw}
|
||||||
are equivalent to @samp{[[:space:]]} and @samp{[[:word:]]}, respectively.
|
are equivalent to @samp{[[:space:]]} and @samp{[[:word:]]}, respectively.
|
||||||
|
|
||||||
|
|
@ -2193,8 +2202,8 @@ constructs, you should bind it temporarily for as small as possible
|
||||||
a part of the code.
|
a part of the code.
|
||||||
@end defvar
|
@end defvar
|
||||||
|
|
||||||
@node POSIX Regexps
|
@node Longest Match
|
||||||
@section POSIX Regular Expression Searching
|
@section Longest-match searching for regular expression matches
|
||||||
|
|
||||||
@cindex backtracking and POSIX regular expressions
|
@cindex backtracking and POSIX regular expressions
|
||||||
The usual regular expression functions do backtracking when necessary
|
The usual regular expression functions do backtracking when necessary
|
||||||
|
|
@ -2209,7 +2218,9 @@ possibilities and found all matches, so they can report the longest
|
||||||
match, as required by POSIX@. This is much slower, so use these
|
match, as required by POSIX@. This is much slower, so use these
|
||||||
functions only when you really need the longest match.
|
functions only when you really need the longest match.
|
||||||
|
|
||||||
The POSIX search and match functions do not properly support the
|
Despite their names, the POSIX search and match functions
|
||||||
|
use Emacs regular expressions, not POSIX regular expressions.
|
||||||
|
@xref{POSIX Regexps}. Also, they do not properly support the
|
||||||
non-greedy repetition operators (@pxref{Regexp Special, non-greedy}).
|
non-greedy repetition operators (@pxref{Regexp Special, non-greedy}).
|
||||||
This is because POSIX backtracking conflicts with the semantics of
|
This is because POSIX backtracking conflicts with the semantics of
|
||||||
non-greedy repetition.
|
non-greedy repetition.
|
||||||
|
|
@ -2957,3 +2968,97 @@ values of the variables @code{sentence-end-double-space}
|
||||||
@code{sentence-end-without-period}, and
|
@code{sentence-end-without-period}, and
|
||||||
@code{sentence-end-without-space}.
|
@code{sentence-end-without-space}.
|
||||||
@end defun
|
@end defun
|
||||||
|
|
||||||
|
@node POSIX Regexps
|
||||||
|
@section Emacs versus POSIX Regular Expressions
|
||||||
|
@cindex POSIX regular expressions
|
||||||
|
|
||||||
|
Regular expression syntax varies signficantly among computer programs.
|
||||||
|
When writing Elisp code that generates regular expressions for use by other
|
||||||
|
programs, it is helpful to know how syntax variants differ.
|
||||||
|
To give a feel for the variation, this section discusses how
|
||||||
|
Emacs regular expressions differ from two syntax variants standarded by POSIX:
|
||||||
|
basic regular expressions (BREs) and extended regular expressions (EREs).
|
||||||
|
Plain @command{grep} uses BREs, and @samp{grep -E} uses EREs.
|
||||||
|
|
||||||
|
Emacs regular expressions have a syntax closer to EREs than to BREs,
|
||||||
|
with some extensions. Here is a summary of how POSIX BREs and EREs
|
||||||
|
differ from Emacs regular expressions.
|
||||||
|
|
||||||
|
@itemize @bullet
|
||||||
|
@item
|
||||||
|
In POSIX BREs @samp{+} and @samp{?} are not special.
|
||||||
|
The only backslash escape sequences are @samp{\(@dots{}\)},
|
||||||
|
@samp{\@{@dots{}\@}}, @samp{\1} through @samp{\9}, along with the
|
||||||
|
escaped special characters @samp{\$}, @samp{\*}, @samp{\.}, @samp{\[},
|
||||||
|
@samp{\\}, and @samp{\^}.
|
||||||
|
Therefore @samp{\(?:} acts like @samp{\([?]:}.
|
||||||
|
POSIX does not define how other BRE escapes behave;
|
||||||
|
for example, GNU @command{grep} treats @samp{\|} like Emacs does,
|
||||||
|
but does not support all the Emacs escapes.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In POSIX EREs @samp{@{}, @samp{(} and @samp{|} are special,
|
||||||
|
and @samp{)} is special when matched with a preceding @samp{(}.
|
||||||
|
These special characters do not use preceding backslashes;
|
||||||
|
@samp{(?} produces undefined results.
|
||||||
|
The only backslash escape sequences are the escaped special characters
|
||||||
|
@samp{\$}, @samp{\(}, @samp{\)}, @samp{\*}, @samp{\+}, @samp{\.},
|
||||||
|
@samp{\?}, @samp{\[}, @samp{\\}, @samp{\^}, @samp{\@{} and @samp{\|}.
|
||||||
|
POSIX does not define how other ERE escapes behave;
|
||||||
|
for example, GNU @samp{grep -E} treats @samp{\1} like Emacs does,
|
||||||
|
but does not support all the Emacs escapes.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In POSIX BREs, it is an implementation option whether @samp{^} is special
|
||||||
|
after @samp{\(}; GNU @command{grep} treats it like Emacs does.
|
||||||
|
In POSIX EREs, @samp{^} is always special outside of bracket expressions,
|
||||||
|
which means the ERE @samp{x^} never matches.
|
||||||
|
In Emacs regular expressions, @samp{^} is special only at the
|
||||||
|
beginning of the regular expression, or after @samp{\(}, @samp{\(?:}
|
||||||
|
or @samp{\|}.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In POSIX BREs, it is an implementation option whether @samp{$} is special
|
||||||
|
before @samp{\)}; GNU @command{grep} treats it like Emacs does.
|
||||||
|
In POSIX EREs, @samp{$} is always special outside of bracket expressions,
|
||||||
|
which means the ERE @samp{$x} never matches.
|
||||||
|
In Emacs regular expressions, @samp{$} is special only at the
|
||||||
|
end of the regular expression, or before @samp{\)} or @samp{\|}.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In POSIX BREs and EREs, undefined results are produced by repetition
|
||||||
|
operators at the start of a regular expression or subexpression
|
||||||
|
(possibly preceded by @samp{^}), except that the repetition operator
|
||||||
|
@samp{*} has the same behavior in BREs as in Emacs.
|
||||||
|
In Emacs, these operators are treated as ordinary.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In BREs and EREs, undefined results are produced by two repetition
|
||||||
|
operators in sequence. In Emacs, these have well-defined behavior,
|
||||||
|
e.g., @samp{a**} is equivalent to @samp{a*}.
|
||||||
|
|
||||||
|
@item
|
||||||
|
In BREs and EREs, undefined results are produced by empty regular
|
||||||
|
expressions or subexpressions. In Emacs these have well-defined
|
||||||
|
behavior, e.g., @samp{\(\)*} matches the empty string,
|
||||||
|
|
||||||
|
@item
|
||||||
|
In BREs and EREs, undefined results are produced for the named
|
||||||
|
character classes @samp{[:ascii:]}, @samp{[:multibyte:]},
|
||||||
|
@samp{[:nonascii:]}, @samp{[:unibyte:]}, and @samp{[:word:]}.
|
||||||
|
|
||||||
|
@item
|
||||||
|
BREs and EREs can contain collating symbols and equivalence
|
||||||
|
class expressions within bracket expressions, e.g., @samp{[[.ch.]d[=a=]]}.
|
||||||
|
Emacs regular expressions do not support this.
|
||||||
|
|
||||||
|
@item
|
||||||
|
BREs, EREs, and the strings they match cannot contain encoding errors
|
||||||
|
or NUL bytes. In Emacs these constructs simply match themselves.
|
||||||
|
|
||||||
|
@item
|
||||||
|
BRE and ERE searching always finds the longest match.
|
||||||
|
Emacs searching by default does not necessarily do so.
|
||||||
|
@xref{Longest Match}.
|
||||||
|
@end itemize
|
||||||
|
|
|
||||||
|
|
@ -1453,7 +1453,7 @@ and initial semicolons."
|
||||||
;; are buffer-local, but we avoid changing them so that they can be set
|
;; are buffer-local, but we avoid changing them so that they can be set
|
||||||
;; to make `forward-paragraph' and friends do something the user wants.
|
;; to make `forward-paragraph' and friends do something the user wants.
|
||||||
;;
|
;;
|
||||||
;; `paragraph-start': The `(' in the character alternative and the
|
;; `paragraph-start': The `(' in the bracket expression and the
|
||||||
;; left-singlequote plus `(' sequence after the \\| alternative prevent
|
;; left-singlequote plus `(' sequence after the \\| alternative prevent
|
||||||
;; sexps and backquoted sexps that follow a docstring from being filled
|
;; sexps and backquoted sexps that follow a docstring from being filled
|
||||||
;; with the docstring. This setting has the consequence of inhibiting
|
;; with the docstring. This setting has the consequence of inhibiting
|
||||||
|
|
|
||||||
|
|
@ -9029,7 +9029,6 @@ is non-numeric or nil fetch the number specified by the
|
||||||
(id (mail-header-id header))
|
(id (mail-header-id header))
|
||||||
(gnus-inhibit-demon t)
|
(gnus-inhibit-demon t)
|
||||||
(gnus-summary-ignore-duplicates t)
|
(gnus-summary-ignore-duplicates t)
|
||||||
(gnus-read-all-available-headers t)
|
|
||||||
(gnus-refer-thread-use-search
|
(gnus-refer-thread-use-search
|
||||||
(if (or (null limit) (numberp limit))
|
(if (or (null limit) (numberp limit))
|
||||||
gnus-refer-thread-use-search
|
gnus-refer-thread-use-search
|
||||||
|
|
@ -9049,7 +9048,8 @@ is non-numeric or nil fetch the number specified by the
|
||||||
(gnus-search-thread header))
|
(gnus-search-thread header))
|
||||||
;; Otherwise just retrieve some headers.
|
;; Otherwise just retrieve some headers.
|
||||||
(t
|
(t
|
||||||
(let* ((limit (if (numberp limit)
|
(let* ((gnus-read-all-available-headers t)
|
||||||
|
(limit (if (numberp limit)
|
||||||
limit
|
limit
|
||||||
gnus-refer-thread-limit))
|
gnus-refer-thread-limit))
|
||||||
(last (if (numberp limit)
|
(last (if (numberp limit)
|
||||||
|
|
|
||||||
|
|
@ -383,7 +383,7 @@ Interactively, ARG is the numeric argument, and defaults to 1."
|
||||||
The syntax for this variable is like the syntax used inside of `[...]'
|
The syntax for this variable is like the syntax used inside of `[...]'
|
||||||
in a regular expression--but without the `[' and the `]'.
|
in a regular expression--but without the `[' and the `]'.
|
||||||
It is NOT a regular expression, and should follow the usual
|
It is NOT a regular expression, and should follow the usual
|
||||||
rules for the contents of a character alternative.
|
rules for the contents of a bracket expression.
|
||||||
It defines a set of \"interesting characters\" to look for when setting
|
It defines a set of \"interesting characters\" to look for when setting
|
||||||
\(or searching for) tab stops, initially \"!-~\" (all printing characters).
|
\(or searching for) tab stops, initially \"!-~\" (all printing characters).
|
||||||
For example, suppose that you are editing a table which is formatted thus:
|
For example, suppose that you are editing a table which is formatted thus:
|
||||||
|
|
|
||||||
|
|
@ -2597,7 +2597,7 @@ regex_compile (re_char *pattern, ptrdiff_t size,
|
||||||
|
|
||||||
/* If followed by a repetition operator. */
|
/* If followed by a repetition operator. */
|
||||||
|| (p != pend
|
|| (p != pend
|
||||||
&& (*p == '*' || *p == '+' || *p == '?' || *p == '^'))
|
&& (*p == '*' || *p == '+' || *p == '?'))
|
||||||
|| (p + 1 < pend && p[0] == '\\' && p[1] == '{'))
|
|| (p + 1 < pend && p[0] == '\\' && p[1] == '{'))
|
||||||
{
|
{
|
||||||
/* Start building a new exactn. */
|
/* Start building a new exactn. */
|
||||||
|
|
|
||||||
|
|
@ -52,7 +52,7 @@
|
||||||
;; no leading/trailing whitespace.
|
;; no leading/trailing whitespace.
|
||||||
(should (equal (eshell-stringify '(1 2 3)) "(1 2 3)"))
|
(should (equal (eshell-stringify '(1 2 3)) "(1 2 3)"))
|
||||||
(should (equal (replace-regexp-in-string
|
(should (equal (replace-regexp-in-string
|
||||||
(rx (+ (or space "\n"))) " "
|
(rx (+ (any space "\n"))) " "
|
||||||
(eshell-stringify '((1 2) (3 . 4))))
|
(eshell-stringify '((1 2) (3 . 4))))
|
||||||
"((1 2) (3 . 4))")))
|
"((1 2) (3 . 4))")))
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1237,8 +1237,6 @@ GUESSED-MAJOR-MODES-SYM are bound to the useful return values of
|
||||||
|
|
||||||
(defvar tramp-histfile-override)
|
(defvar tramp-histfile-override)
|
||||||
(defun eglot--call-with-tramp-test (fn)
|
(defun eglot--call-with-tramp-test (fn)
|
||||||
(unless (>= emacs-major-version 27)
|
|
||||||
(ert-skip "Eglot Tramp support only on Emacs >= 27"))
|
|
||||||
;; Set up a Tramp method that’s just a shell so the remote host is
|
;; Set up a Tramp method that’s just a shell so the remote host is
|
||||||
;; really just the local host.
|
;; really just the local host.
|
||||||
(let* ((tramp-remote-path (cons 'tramp-own-remote-path
|
(let* ((tramp-remote-path (cons 'tramp-own-remote-path
|
||||||
|
|
@ -1260,6 +1258,9 @@ GUESSED-MAJOR-MODES-SYM are bound to the useful return values of
|
||||||
(when (and noninteractive (not (file-directory-p "~/")))
|
(when (and noninteractive (not (file-directory-p "~/")))
|
||||||
(setenv "HOME" temporary-file-directory)))))
|
(setenv "HOME" temporary-file-directory)))))
|
||||||
(default-directory temporary-file-directory))
|
(default-directory temporary-file-directory))
|
||||||
|
;; We must check the remote LSP server. So far, just "clangd" is used.
|
||||||
|
(unless (ignore-errors (executable-find "clangd" 'remote))
|
||||||
|
(ert-skip "Remote clangd not found"))
|
||||||
(funcall fn)))
|
(funcall fn)))
|
||||||
|
|
||||||
(ert-deftest eglot-test-tramp-test ()
|
(ert-deftest eglot-test-tramp-test ()
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue