1
Fork 0
mirror of git://git.sv.gnu.org/emacs.git synced 2026-01-16 08:10:43 -08:00
emacs/test
Michal Nazarewicz 6dc6b0079e Fix ‘[[:cc:]]*literal’ regex failing to match ‘literal’ (bug#24020)
The regex engine tries to optimise Kleene star by avoiding backtracking
when it can detect that star’s operand cannot match what follows it in
the pattern.

For example, when ‘[[:alpha:]]*1’ tries to match a ‘foo’, the engine
will test the longest match for ‘[[:alpha:]]*’, namely ’foo’ which is
the entire string.  Literal digit one still present in the pattern will
however not match the remaining empty string.

Normally, backtracking would be performed trying a shorter match for the
character class (namely ‘fo’ leaving ‘o’ in the string), but since the
engine knows whatever would be put back into the string cannot possibly
match literal digit one so no backtracking will be attempted.

In the regexes of the form ‘[[:CC:]]*X’, the optimisation can be applied
if the character class CC does not match character X.  In the above
example, this holds because digit one is not in alpha character class.

This test is performed by mutually_exclusive_p function but it did not
check class bits of a charset opcode.  This resulted in an assumption
that character classes do not match multibyte characters.  For example,
it would incorrectly conclude that [[:alpha:]] doesn’t match ‘ż’.

This, in turn, led to the aforementioned Kleene star optimisation being
incorrectly applied in patterns such as ‘[[:graph:]]*☠’ (which should
match ‘☠’ but doesn’t as can be tested by executing
    (string-match-p "[[:graph:]]*☠" "☠")
which should return 0 but instead yields nil.

This issue affects any class witch matches multibyte characters, i.e.
if ‘[[:cc:]]’ matches a multibyte character X then ‘[[:cc:]]*X’ will
fail to match ‘X’.

* src/regex.c (executing_charset): A new function for executing the
charset and charset_not opcodes.  It performs check on the character
taking into consideration existing bitmap, range table and class bits.
It also advances the pointer in the regex bytecode past the parsed
opcode.
(CHARSET_LOOKUP_RANGE_TABLE_RAW, CHARSET_LOOKUP_RANGE_TABLE): Removed.
Code now included in executing_charset.
(mutually_exclusive_p, re_match_2_internal): Changed to take advantage
of executing_charset function.

* test/src/regex-tests.el: New file with tests for the character class
matching.
2016-07-25 23:52:27 +02:00
..
data
etags
lisp Fix failing test 2016-07-21 21:51:24 +09:00
manual
src Fix ‘[[:cc:]]*literal’ regex failing to match ‘literal’ (bug#24020) 2016-07-25 23:52:27 +02:00
ChangeLog.1 ; Spelling fixes 2016-06-26 00:06:56 +02:00
file-organisation.org
make-test-deps.emacs-lisp
Makefile.in Run tests from non-byte compiled files 2016-07-07 09:35:11 +01:00
README

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Copyright (C) 2008-2016 Free Software Foundation, Inc.
See the end of the file for license conditions.

This directory contains files intended to test various aspects of
Emacs's functionality.  Please help add tests!

Emacs uses ERT, Emacs Lisp Regression Testing, for testing.  See (info
"(ert)") or https://www.gnu.org/software/emacs/manual/html_node/ert/
for more information on writing and running tests.

The Makefile in this directory supports the following targets:

* make check
  Run all tests as defined in the directory.  Expensive tests are
  suppressed.  The result of the tests for <filename>.el is stored in
  <filename>.log.

* make check-maybe
  Like "make check", but run only the tests for files which have
  unresolved prerequisites.

* make check-expensive
  Like "make check", but run also the tests marked as expensive.

* make <filename>  or  make <filename>.log
  Run all tests declared in <filename>.el.  This includes expensive
  tests.  In the former case the output is shown on the terminal, in
  the latter case the output is written to <filename>.log.

ERT offers selectors, which make it possible to filter out which test
cases shall run.  The make variable $(SELECTOR) gives you a simple
mean to use your own selectors.  The ERT manual describes how
selectors are constructed, see (info "(ert)Test Selectors") or
https://www.gnu.org/software/emacs/manual/html_node/ert/Test-Selectors.html

You could use predefined selectors of the Makefile.  "make <filename>
SELECTOR='$(SELECTOR_DEFAULT)'" runs all tests for <filename>.el
except the tests tagged as expensive.

If your test file contains the tests "test-foo", "test2-foo" and
"test-foo-remote", and you want to run only the former two tests, you
could use a selector regexp: "make <filename> SELECTOR='\"foo$$\"'".


(Also, see etc/compilation.txt for compilation mode font lock tests.)


This file is part of GNU Emacs.

GNU Emacs is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

GNU Emacs is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with GNU Emacs.  If not, see <http://www.gnu.org/licenses/>.