* doc/lispref/searching.texi (Search and Replace): Update.
* lisp/bindings.el (mode-line-position): Update callers.
* lisp/subr.el (string-replace): Rename from replace-in-string
since that clashes with XEmacs' replace-in-string which is
equivalent to the Emacs replace-regexp-in-string (bug#43598).
Perform 'regexp-opt' on nested 'or' forms, and after expansion of
user-defined and 'eval' forms. Characters are now turned into strings
for wider 'regexp-opt' scope. This preserves the longest-match
semantics for string in 'or' forms over composition.
* doc/lispref/searching.texi (Rx Constructs): Document.
* lisp/emacs-lisp/rx.el (rx--normalise-or-arg)
(rx--all-string-or-args): New.
(rx--translate-or): Normalise arguments first, and check for strings
in subforms.
(rx--expand-eval): Extracted from rx--translate-eval.
(rx--translate-eval): Call rx--expand-eval.
* test/lisp/emacs-lisp/rx-tests.el (rx-or, rx-def-in-or): Add tests.
* etc/NEWS: Announce.
* doc/lispref/searching.texi (Char Classes): Warn about erroneous
usage of named character classes. Suggested by Stephen Leake
<stephen_leake@stephe-leake.org>.
This argument was added for the 'or' clause in rx, but it turned out
to be a bad idea (bug#37659), and there seems to be little other use
for it.
* lisp/emacs-lisp/regexp-opt.el (regexp-opt): Remove KEEP-ORDER.
* doc/lispref/searching.texi (Regexp Functions):
* etc/NEWS: Remove it from the documentation.
* test/lisp/emacs-lisp/regexp-opt-tests.el (regexp-opt-test--match-all)
(regexp-opt-test--check-perm, regexp-opt-test--explain-perm)
(regexp-opt-keep-order, regexp-opt-longest-match): Simplify test.
Revert to the Emacs 26 semantics that always gave the longest match
for rx 'or' forms with only string arguments. This guarantee was
never well documented, but it is useful and people likely have come to
rely on it. For example, prior to this change,
(rx (or ">" ">="))
matched ">" even if the text contained ">=".
* lisp/emacs-lisp/rx.el (rx--translate-or): Don't tell regexp-opt to
preserve the matching order.
* doc/lispref/searching.texi (Rx Constructs): Document the
longest-match guarantee for all-string 'or' forms.
* test/lisp/emacs-lisp/rx-tests.el (rx-or): Update test.
* lisp/emacs-lisp/regexp-opt.el (regexp-opt):
* doc/lispref/searching.texi (Regexp Functions):
Be more specific about how the KEEP-ORDER argument actually works.
If nil, the regexp guarantees a longest match; this is the behaviour
that many callers implicitly rely on.
The `not' and `intersection' forms, and `or' inside these forms,
now accept characters and single-character strings as arguments.
Previously, they had to be wrapped in `any' forms.
This does not add expressive power but is a convenience and is easily
understood.
* doc/lispref/searching.texi (Rx Constructs): Amend the documentation.
* etc/NEWS: Announce the change.
* lisp/emacs-lisp/rx.el (rx--charset-p, rx--translate-not)
(rx--charset-intervals, rx): Accept characters and 1-char strings in
more places.
* test/lisp/emacs-lisp/rx-tests.el (rx-not, rx-charset-or)
(rx-def-in-charset-or, rx-intersection): Test the change.
These character set operations, together with `not' for set
complement, improve the compositionality of rx, and reduce duplication
in complicated cases. Named character classes are not permitted in
set operations.
* lisp/emacs-lisp/rx.el (rx--translate-any): Split into multiple
functions.
(rx--foldl, rx--parse-any, rx--generate-alt, rx--intervals-to-alt)
(rx--complement-intervals, rx--intersect-intervals)
(rx--union-intervals, rx--charset-intervals, rx--charset-union)
(rx--charset-all, rx--charset-intersection, rx--translate-union)
(rx--translate-intersection): New.
(rx--translate-not, rx--translate-form, rx--builtin-forms, rx):
Add `union' and `intersection'.
* test/lisp/emacs-lisp/rx-tests.el (rx-union ,rx-def-in-union)
(rx-intersection, rx-def-in-intersection): New tests.
* doc/lispref/searching.texi (Rx Constructs):
* etc/NEWS:
Document `union' and `intersection'.
* lisp/emacs-lisp/rx.el (rx--translate-symbol, rx--builtin-symbols, rx):
* test/lisp/emacs-lisp/rx-tests.el (rx-atoms):
* doc/lispref/searching.texi (Rx Constructs):
* etc/NEWS:
Add `anychar', an alias for `anything'. Since `anychar' is more
descriptive (and slightly shorter), treat it as the preferred name.
* lisp/emacs-lisp/rx.el:
* test/lisp/emacs-lisp/rx-tests.el:
* doc/lispref/searching.texi (Rx Constructs):
Rewrite rx for correctness, clarity, and performance. The new
implementation retains full compatibility and has more comprehensive
tests.
* lisp/emacs-lisp/re-builder.el (reb-rx-font-lock-keywords):
Adapt to changes in internal variables in rx.el.
* src/search.c (Fregexp_quote): Only allocate a new string if needed.
* doc/lispref/searching.texi (Regexp Functions):
* etc/NEWS (Incompatible Lisp Changes): Document.
The additions are excluded from the print version to avoid making it
thicker.
* doc/lispref/elisp.texi (Top): New menu entry.
* doc/lispref/searching.texi (Regular Expressions): New menu entry.
(Regexp Example): Add rx form of the example.
(Rx Notation, Rx Constructs, Rx Functions): New nodes.
* doc/lispref/control.texi (pcase Macro): Describe the rx pattern.
`replace-regexp-in-string' omits the first START characters of the
input string in its return value. This is a clear bug, but fixing it
probably causes more trouble; document the behaviour instead (bug#36372).
* doc/lispref/searching.texi (Search and Replace)
* lisp/subr.el (replace-regexp-in-string):
Document current behaviour.
* doc/lispref/searching.texi (Regexp Special):
Mention char classes earlier, in a more-logical place.
Advise sticking to ASCII letters and digits in ranges.
Reword negative advice to make it clearer that it’s negative.
* lisp/files.el (make-auto-save-file-name):
* lisp/gnus/message.el (message-mailer-swallows-blank-line):
* lisp/gnus/nndoc.el (nndoc-lanl-gov-announce-type-p)
(nndoc-generate-lanl-gov-head):
* lisp/org/org-eshell.el (org-eshell-open):
* lisp/org/org.el (org-deadline-time-hour-regexp)
(org-scheduled-time-hour-regexp):
* lisp/progmodes/bat-mode.el (bat-font-lock-keywords):
* lisp/progmodes/bug-reference.el (bug-reference-bug-regexp):
* lisp/textmodes/less-css-mode.el (less-css-font-lock-keywords):
* lisp/vc/vc-cvs.el (vc-cvs-valid-symbolic-tag-name-p):
* lisp/vc/vc-svn.el (vc-svn-valid-symbolic-tag-name-p):
Avoid attempts to chain ranges, as this can be confusing.
For example, instead of [0-9-_.], use [0-9_.-].
* doc/lispref/searching.texi (Regexp Special): Simplify style
advice for order of ], ^, and - in character alternatives.
Stick with saying that it’s not a good idea to put ‘-’ after a
range. Remove the special case about raw 8-bit bytes and
unibyte characters, as this documentation is confusing and
seems to be incorrect in some cases. Say that z-a is the
preferred style for reversed ranges, since it’s clearer and is
typically what’s used in practice. Mention some bad styles:
duplicates in character alternatives, ranges that denote <=3
characters, and ‘-’ as the first character.
* doc/lispref/searching.texi (Regexp Special): Say that
regular expressions like "[a-m-z]" and "[[:alpha:]-~]" should
be avoided, for the same reason that regular expressions like
"+" and "*" should be avoided: POSIX says their behavior is
undefined, and they are confusing anyway. Also, explain
better what happens when the bound of a range is a raw 8-bit
byte; the old explanation appears to have been obsolete
anyway. Finally, say that ranges like "[\u00FF-\xFF]" that
mix non-ASCII characters and raw 8-bit bytes should be
avoided, since it’s not clear what they should mean.
* doc/lispref/searching.texi (Regular Expression Functions):
* lisp/emacs-lisp/regexp-opt.el (regexp-opt):
Rename newly added `noreorder' argument to `keep-order', to avoid a
negative in the name. Suggested by Phil Sainty (Bug#34641).
When regexp-opt is called with an empty list of strings, return a regexp
that doesn't match anything instead of the empty string (Bug#20307).
* doc/lispref/searching.texi (Regular Expression Functions):
* etc/NEWS:
Document the new behaviour.
* lisp/emacs-lisp/regexp-opt.el (regexp-opt):
Return a never-match regexp for empty inputs.
The rx `or' form may reorder its arguments in an unpredictable way,
contrary to user expectation, since it sometimes uses `regexp-opt'.
Add a NOREORDER option to `regexp-opt' for preventing it from
producing a reordered regexp (Bug#34641).
* doc/lispref/searching.texi (Regular Expression Functions):
* etc/NEWS (Lisp Changes in Emacs 27.1):
Describe the new regexp-opt NOREORDER argument.
* lisp/emacs-lisp/regexp-opt.el (regexp-opt): Add NOREORDER.
Make no attempt at regexp improvement if the set of strings contains
a prefix of another string.
(regexp-opt--contains-prefix): New.
* lisp/emacs-lisp/rx.el (rx-or): Call regexp-opt with NOREORDER.
* test/lisp/emacs-lisp/rx-tests.el: Test rx `or' form match order.
* doc/lispref/searching.texi (Character Classes):
* lisp/emacs-lisp/rx.el (rx):
Document that [:cntrl:] excludes DEL.
* test/src/regex-emacs-tests.el (regex-tests-PTESTS-whitelist):
Swap misplaced comments and fix wrong code for DEL.