mirror of
git://git.sv.gnu.org/emacs.git
synced 2025-12-06 06:20:55 -08:00
Improve documentation of letter-case conversions
* doc/lispref/nonascii.texi (Character Properties): * doc/lispref/strings.texi (Case Conversion, Case Tables): Document that special-casing rules override the case-table conversions. (Bug#74155)
This commit is contained in:
parent
0f9d48e99c
commit
f7b85fe986
2 changed files with 39 additions and 13 deletions
|
|
@ -632,8 +632,10 @@ is @code{nil}, which means the character itself.
|
||||||
Corresponds to Unicode language- and context-independent special upper-casing
|
Corresponds to Unicode language- and context-independent special upper-casing
|
||||||
rules. The value of this property is a string (which may be empty). For
|
rules. The value of this property is a string (which may be empty). For
|
||||||
example mapping for U+00DF @sc{latin small letter sharp s} is
|
example mapping for U+00DF @sc{latin small letter sharp s} is
|
||||||
@code{"SS"}. For characters with no special mapping, the value is @code{nil}
|
@code{"SS"}. This mapping overrides the @code{uppercase} property, and
|
||||||
which means @code{uppercase} property needs to be consulted instead.
|
thus the current case table. For characters with no special mapping,
|
||||||
|
the value is @code{nil}, which means @code{uppercase} property needs to
|
||||||
|
be consulted instead.
|
||||||
|
|
||||||
@item special-lowercase
|
@item special-lowercase
|
||||||
Corresponds to Unicode language- and context-independent special
|
Corresponds to Unicode language- and context-independent special
|
||||||
|
|
@ -641,16 +643,19 @@ lower-casing rules. The value of this property is a string (which may
|
||||||
be empty). For example mapping for U+0130 @sc{latin capital letter i
|
be empty). For example mapping for U+0130 @sc{latin capital letter i
|
||||||
with dot above} the value is @code{"i\u0307"} (i.e. 2-character string
|
with dot above} the value is @code{"i\u0307"} (i.e. 2-character string
|
||||||
consisting of @sc{latin small letter i} followed by U+0307
|
consisting of @sc{latin small letter i} followed by U+0307
|
||||||
@sc{combining dot above}). For characters with no special mapping,
|
@sc{combining dot above}). This mapping overrides the @code{lowercase}
|
||||||
the value is @code{nil} which means @code{lowercase} property needs to
|
property, and thus the current case table. For characters with no
|
||||||
be consulted instead.
|
special mapping, the value is @code{nil}, which means @code{lowercase}
|
||||||
|
property needs to be consulted instead.
|
||||||
|
|
||||||
@item special-titlecase
|
@item special-titlecase
|
||||||
Corresponds to Unicode unconditional special title-casing rules. The value of
|
Corresponds to Unicode unconditional special title-casing rules. The value of
|
||||||
this property is a string (which may be empty). For example mapping for
|
this property is a string (which may be empty). For example mapping for
|
||||||
U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}. For
|
U+FB01 @sc{latin small ligature fi} the value is @code{"Fi"}. This
|
||||||
characters with no special mapping, the value is @code{nil} which means
|
mapping overrides the @code{titlecase} property, and thus the current
|
||||||
@code{titlecase} property needs to be consulted instead.
|
case table. For characters with no special mapping, the value is
|
||||||
|
@code{nil}, which means @code{titlecase} property needs to be consulted
|
||||||
|
instead.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
@defun get-char-code-property char propname
|
@defun get-char-code-property char propname
|
||||||
|
|
|
||||||
|
|
@ -1591,9 +1591,12 @@ using @code{string} function, before being passed to one of the casing
|
||||||
functions. Of course, no assumptions on the length of the result may
|
functions. Of course, no assumptions on the length of the result may
|
||||||
be made.
|
be made.
|
||||||
|
|
||||||
Mapping for such special cases are taken from
|
Other characters can also have special case-conversion rules. They
|
||||||
@code{special-uppercase}, @code{special-lowercase} and
|
all have non-@code{nil} character properties @code{special-uppercase},
|
||||||
@code{special-titlecase} @xref{Character Properties}.
|
@code{special-lowercase} or @code{special-titlecase} (@pxref{Character
|
||||||
|
Properties}) defined by the Unicode Standard. These properties define
|
||||||
|
special case-conversion rules which override the current case table
|
||||||
|
(@pxref{Case Tables}).
|
||||||
|
|
||||||
@xref{Text Comparison}, for functions that compare strings; some of
|
@xref{Text Comparison}, for functions that compare strings; some of
|
||||||
them ignore case differences, or can optionally ignore case differences.
|
them ignore case differences, or can optionally ignore case differences.
|
||||||
|
|
@ -1634,14 +1637,32 @@ correspondence. There may be two different lower case letters with the
|
||||||
same upper case equivalent. In these cases, you need to specify the
|
same upper case equivalent. In these cases, you need to specify the
|
||||||
maps for both lower case and upper case.
|
maps for both lower case and upper case.
|
||||||
|
|
||||||
The extra table @var{canonicalize} maps each character to a canonical
|
Some characters have special case-conversion rules defined for them,
|
||||||
|
which by default override the current case table. These characters have
|
||||||
|
non-@code{nil} character properties @code{special-uppercase},
|
||||||
|
@code{special-lowercase} or @code{special-titlecase} (@pxref{Character
|
||||||
|
Properties}) defined by the Unicode Standard. An example is U+00DF
|
||||||
|
LATIN SMALL LETTER SHARP S, @ss{}, which by default up-cases to the
|
||||||
|
string @code{"SS"}, not to U+1E9E LATIN CAPITAL LETTER SHARP S@. To
|
||||||
|
force these characters follow the case-table conversions, set the
|
||||||
|
corresponding Unicode property to @code{nil}:
|
||||||
|
|
||||||
|
@example
|
||||||
|
(upcase "@ss{}")
|
||||||
|
=> "SS"
|
||||||
|
(put-char-code-property ?@ss{} 'special-uppercase nil)
|
||||||
|
(upcase "@ss{}")
|
||||||
|
=> "ẞ"
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The extra slot @var{canonicalize} of a case table maps each character to a canonical
|
||||||
equivalent; any two characters that are related by case-conversion have
|
equivalent; any two characters that are related by case-conversion have
|
||||||
the same canonical equivalent character. For example, since @samp{a}
|
the same canonical equivalent character. For example, since @samp{a}
|
||||||
and @samp{A} are related by case-conversion, they should have the same
|
and @samp{A} are related by case-conversion, they should have the same
|
||||||
canonical equivalent character (which should be either @samp{a} for both
|
canonical equivalent character (which should be either @samp{a} for both
|
||||||
of them, or @samp{A} for both of them).
|
of them, or @samp{A} for both of them).
|
||||||
|
|
||||||
The extra table @var{equivalences} is a map that cyclically permutes
|
The extra slot @var{equivalences} is a map that cyclically permutes
|
||||||
each equivalence class (of characters with the same canonical
|
each equivalence class (of characters with the same canonical
|
||||||
equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into
|
equivalent). (For ordinary @acronym{ASCII}, this would map @samp{a} into
|
||||||
@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
|
@samp{A} and @samp{A} into @samp{a}, and likewise for each set of
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue