See Bug#24673
* src/eval.c (funcall_lambda): Fix crash for bogus functions such
as (closure).
* test/src/eval-tests.el (eval-tests--bug24673): Add test.
Having one test for all character classes it is not always trivial to
determine which class is failing. This happens when failure is caused
by ‘(should (equal (point) (point-max)))’ not being met.
With per-character class tests, it is immidiatelly obvious which test
causes issues plus tests for all classes are run even if some of them
fail.
* test/src/regex-tests.el (regex-character-classes): Delete and split
into…
(regex-tests-alnum-character-class, regex-tests-alpha-character-class,
regex-tests-ascii-character-class, regex-tests-blank-character-class,
regex-tests-cntrl-character-class, regex-tests-digit-character-class,
regex-tests-graph-character-class, regex-tests-lower-character-class,
regex-tests-multibyte-character-class,
regex-tests-nonascii-character-class,
regex-tests-print-character-class, regex-tests-punct-character-class,
regex-tests-space-character-class,
regex-tests-unibyte-character-class,
regex-tests-upper-character-class, regex-tests-word-character-class,
regex-tests-xdigit-character-class): …new tests.
* src/keyboard.c (parse_solitary_modifier): If the argument SYMBOL
is not a symbol, don't try to recognize it. See
http://lists.gnu.org/archive/html/emacs-devel/2016-08/msg00502.html
for the details.
* test/src/keymap-tests.el (keymap-where-is-internal-test): New
test, for testing the above fix.
[82a487d: Fix reading of regex-resources in regex-tests] attempted to
fix regex-tests failing when run from the source tree (i.e. via make)
by hard-coding path to regex-resources directory relative to the test
directory.
This fixed runs from the tree but broke the test when run using other
methods.
Fix by trying ‘load-file-name’ or ‘buffer-file-name’, whichever is set.
* test/src/regex-tests.el (regex-tests--resources-dir): New variable
storing path to the regex-resources directory.
(regex-tests-generic-line): Use aforementioned variable.
This fixes the following warning:
In toplevel form:
src/regex-tests.el:416:1:Warning: Unused lexical variable ‘newline’
* test/src/regex-tests.el (regex-tests-BOOST): Remove unused lexical
variable.
* test/src/regex-tests.el (regex-tests): Remove and split into multiple
tests cases.
(regex-tests-glbic-BOOST, regex-tests-glibc-PCRE,
regex-tests-glibc-PTESTS, regex-tests-glibc-TESTS): New test cases split
from ‘regex-tests’.
* test/src/regex-tests.el (regex-tests-generic-line): Referring to
‘buffer-file-name’ does not work when running the test from command
line, i.e. via make, which results in (wrong-type-argument stringp nil)
failures. Replace it with hard-coded path.
(regex-tests-BOOST, regex-tests-PCRE, regex-tests-PTESTS-whitelist,
regex-tests-TESTS-whitelist): ‘regex-tests-generic-line’ now includes
the ‘regex-resources’ path component so the tests don’t need to specify
it explicitly.
* test/src/regex-resources/BOOST.tests:
* test/src/regex-resources/PCRE.tests:
* test/src/regex-resources/PTESTS:
* test/src/regex-resources/TESTS:
New test data files
[mina86@mina86.com: Moved files from test/src/regex/* to test/src/*.]
The regex engine tries to optimise Kleene star by avoiding backtracking
when it can detect that star’s operand cannot match what follows it in
the pattern.
For example, when ‘[[:alpha:]]*1’ tries to match a ‘foo’, the engine
will test the longest match for ‘[[:alpha:]]*’, namely ’foo’ which is
the entire string. Literal digit one still present in the pattern will
however not match the remaining empty string.
Normally, backtracking would be performed trying a shorter match for the
character class (namely ‘fo’ leaving ‘o’ in the string), but since the
engine knows whatever would be put back into the string cannot possibly
match literal digit one so no backtracking will be attempted.
In the regexes of the form ‘[[:CC:]]*X’, the optimisation can be applied
if the character class CC does not match character X. In the above
example, this holds because digit one is not in alpha character class.
This test is performed by mutually_exclusive_p function but it did not
check class bits of a charset opcode. This resulted in an assumption
that character classes do not match multibyte characters. For example,
it would incorrectly conclude that [[:alpha:]] doesn’t match ‘ż’.
This, in turn, led to the aforementioned Kleene star optimisation being
incorrectly applied in patterns such as ‘[[:graph:]]*☠’ (which should
match ‘☠’ but doesn’t as can be tested by executing
(string-match-p "[[:graph:]]*☠" "☠")
which should return 0 but instead yields nil.
This issue affects any class witch matches multibyte characters, i.e.
if ‘[[:cc:]]’ matches a multibyte character X then ‘[[:cc:]]*X’ will
fail to match ‘X’.
* src/regex.c (executing_charset): A new function for executing the
charset and charset_not opcodes. It performs check on the character
taking into consideration existing bitmap, range table and class bits.
It also advances the pointer in the regex bytecode past the parsed
opcode.
(CHARSET_LOOKUP_RANGE_TABLE_RAW, CHARSET_LOOKUP_RANGE_TABLE): Removed.
Code now included in executing_charset.
(mutually_exclusive_p, re_match_2_internal): Changed to take advantage
of executing_charset function.
* test/src/regex-tests.el: New file with tests for the character class
matching.
* src/emacs.c (main) [WINDOWSNT]: Move init_environment calls after the
set_initial_environment call. This prevents Emacs' modifications to the
environment from contaminating Vprocess_environment and
Vinitial_environment (Bug #10980).
* src/callproc.c (getenv_internal) [WINDOWSNT]: Consult Emacs' internal
environment in as a fallback to Vprocess_environment.
* test/src/callproc-tests.el (initial-environment-preserved): New Test.
* src/editfns.c (styled_format): Don't include padding on the left
in the properties at the beginning of the string. (Bug#23897)
* test/src/editfns-tests.el (format-properties): Add tests for
faces when the string is padded on the left or on the right.
* src/textprop.c (extend_property_ranges): Accept an additional
argument OLD_END, and only extend the end of a property range if
its original end is at OLD_END; all the other ranges are left
intact. (Bug#23897)
* src/editfns.c (styled_format): Pass the original length of the
string to 'extend_property_ranges'.
* src/intervals.h (extend_property_ranges): Adjust prototype.
* test/src/editfns-tests.el (format-properties): Add tests for
bug#23897.
* src/chartab.c (char_table_set_range): Start the loop from the
first character of the block to which FROM belongs. (Bug#23797)
* test/src/chartab-tests.el: New test file.
This also fixes the mishandling of "\N{CJK COMPATIBILITY
IDEOGRAPH-F900}", "\N{VARIATION SELECTOR-1}", etc.
Problem reported by Eli Zaretskii in:
http://lists.gnu.org/archive/html/emacs-devel/2016-04/msg00614.html
* doc/lispref/nonascii.texi (Character Codes), etc/NEWS: Document this.
* lisp/international/mule-cmds.el (char-from-name): New function.
(read-char-by-name): Use it. Document that "BED" is treated as
a name, not as a hexadecimal number. Reject out-of-range integers,
floating-point numbers, and strings with trailing junk.
* src/lread.c (character_name_to_code): Call char-from-name
instead of inspecting ucs-names directly, so that we handle
computed names like "VARIATION SELECTOR-1". Do not use an auto
string, since char-from-name might GC.
* test/src/lread-tests.el: Add tests for new behavior, and
fix some old tests that were wrong.
* doc/lispref/nonascii.texi (Character Properties):
Avoid duplication of Unicode names. Reformat examples to fit in
narrow pages.
* doc/lispref/objects.texi (General Escape Syntax):
Simplify and better-organize explanation of \N{...} escapes.
* src/character.h (CHAR_SURROGATE_PAIR_P): Remove; unused.
(char_surrogate_p): New inline function.
* src/lread.c: Do not include string.h; no longer needed.
(invalid_character_name, check_scalar_value): Remove; the ideas
behind these functions are now bundled into character_name_to_code.
(character_name_to_code): Remove undocumented support for "CJK
IDEOGRAPH-XXXX" names, as "U+XXXX" suffices. Reject monstrosities
like "\N{U+-0}" and null bytes in \N escapes. Reject floating
point in \N escapes instead of returning garbage. Use
AUTO_STRING_WITH_LEN to lessen pressure on the garbage collector.
* test/src/lread-tests.el (lread-char-number, lread-char-name)
(lread-string-char-number, lread-string-char-name):
Test runtime behavior, not compile-time, as the test framework
is not set up to test compile-time.
(lread-char-surrogate-1, lread-char-surrogate-2)
(lread-char-surrogate-3, lread-char-surrogate-4)
(lread-string-char-number-2, lread-string-char-number-3):
New tests.
(lread-string-char-number-1): Rename from lread-string-char-number.
* lread.c (invalid_character_name, check_scalar_value)
(parse_code_after_prefix, character_name_to_code): New helper
functions that use 'ucs-names' and parsing for CJK ideographs.
(read_escape): Use helper functions.
(syms_of_lread): New symbol 'ucs-names'.
* test/src/lread-tests.el: New tests; fix a couple of bugs in
existing tests.
* lread.c (init_character_names): New function.
(read_escape): Read Perl-style named character escape sequences.
(syms_of_lread): Initialize new variable 'character_names'.
* test/src/lread-tests.el (lread-char-empty-name): Add test file
for src/lread.c.
* doc/lispref/text.texi (Checksum/Hash): Document `buffer-hash'.
* src/fns.c (Fbuffer_hash): New function.
(make_digest_string): Refactored out into its own function.
(secure_hash): Use it.
* test/src/fns-tests.el (fns-tests-hash-buffer): New tests.