mirror of
https://gitlab.com/embeddable-common-lisp/ecl.git
synced 2026-01-13 12:52:08 -08:00
103 lines
No EOL
4.9 KiB
XML
103 lines
No EOL
4.9 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE book [
|
|
<!ENTITY % eclent SYSTEM "ecl.ent">
|
|
%eclent;
|
|
]>
|
|
<book xmlns="http://docbook.org/ns/docbook" version="5.0" xml:lang="en">
|
|
<chapter xml:id="ansi.characters">
|
|
<title>Characters</title>
|
|
<para>&ECL; is fully ANSI Common-Lisp compliant in all aspects of the character
|
|
data type, with the following peculiarities.</para>
|
|
|
|
<section xml:id="ansi.characeer.unicode">
|
|
<title>Unicode vs. POSIX locale</title>
|
|
|
|
<para>There are two ways of building &ECL;: with C or with Unicode character codes. These build modes are accessed using the <code>--disable-unicode</code> and <code>--enable-unicode</code> configuration options, the last one being the default.</para>
|
|
|
|
<para>When using C characters we are actually relying on the <type>char</type> type of the C language, using the C library functions for tasks such as character conversions, comparison, etc. In this case characters are typically 8 bit wide and the character order and collation are determines by the current POSIX or C locale. This is not very accurate, leaves out many languages and character encodings but it is sufficient for small applications that do not need multilingual support.</para>
|
|
|
|
<para>When no option is specified &ECL; builds with support for a larger character set, the Unicode 6.0 standard. This uses 24 bit large character codes, also known as <emphasis>codepoints</emphasis>, with a large database of character properties which include their nature (alphanumeric, numeric, etc), their case, their collation properties, whether they are standalone or composing characters, etc.</para>
|
|
|
|
<section xml:id="ansi.character-types">
|
|
<title>Character types</title>
|
|
|
|
<para>If compiled without Unicode support, &ECL; all characters are
|
|
implemented using 8-bit codes and the type <type>extended-char</type>
|
|
is empty. If compiled with Unicode support, characters are implemented
|
|
using 24 bits and the <type>extended-char</type> type covers characters above
|
|
code 255.</para>
|
|
<informaltable>
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry>Type</entry>
|
|
<entry>With Unicode</entry>
|
|
<entry>Without Unicode</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry><type>standard-char</type></entry>
|
|
<entry>#\Newline,32-126</entry>
|
|
<entry>#\Newline,32-126</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>base-char</type></entry>
|
|
<entry>0-255</entry>
|
|
<entry>0-255</entry>
|
|
</row>
|
|
<row>
|
|
<entry><type>extended-char</type></entry>
|
|
<entry>-</entry>
|
|
<entry>255-16777215</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</informaltable>
|
|
</section>
|
|
|
|
<section xml:id="ansi.character-names">
|
|
<title>Character names</title>
|
|
|
|
<para>All characters have a name. For non-printing characters between 0 and 32, and for 127 we use the ordinary <acronym>ASCII</acronym> names. Characters above 127 are printed and read using hexadecimal Unicode notation, with a <literal>U</literal> followed by 24 bit hexadecimal number, as in <literal>U0126</literal>.</para>
|
|
<table xml:id="table.character-names">
|
|
<title>Examples of character names</title>
|
|
<tgroup cols="2">
|
|
<thead>
|
|
<row>
|
|
<entry>Character</entry>
|
|
<entry>Code</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row><entry><literal>#\Null</literal></entry><entry>0</entry></row>
|
|
<row><entry><literal>#\Ack</literal></entry><entry>1</entry></row>
|
|
<row><entry><literal>#\Bell</literal></entry><entry>7</entry></row>
|
|
<row><entry><literal>#\Backspace</literal></entry><entry>8</entry></row>
|
|
<row><entry><literal>#\Tab</literal></entry><entry>9</entry></row>
|
|
<row><entry><literal>#\Newline</literal></entry><entry>10</entry></row>
|
|
<row><entry><literal>#\Linefeed</literal></entry><entry>10</entry></row>
|
|
<row><entry><literal>#\Page</literal></entry><entry>12</entry></row>
|
|
<row><entry><literal>#\Esc</literal></entry><entry>27</entry></row>
|
|
<row><entry><literal>#\Escape</literal></entry><entry>27</entry></row>
|
|
<row><entry><literal>#\Space</literal></entry><entry>32</entry></row>
|
|
<row><entry><literal>#\Rubout</literal></entry><entry>127</entry></row>
|
|
<row><entry><literal>#\U0080</literal></entry><entry>128</entry></row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
<para>Note that <literal>#\Linefeed</literal> is synonymous with
|
|
<literal>#\Newline</literal> and thus is a member of
|
|
<type>standard-char</type>.</para>
|
|
</section>
|
|
</section>
|
|
|
|
<section>
|
|
<title><code>#\Newline</code> characters</title>
|
|
|
|
<para>Internally, &ECL; represents the <literal>#\Newline</literal> character by a single code. However, when using external formats, &ECL; may parse character pairs as a single <literal>#\Newline</literal>, and viceversa, use multiple characters to represent a single <literal>#\Newline</literal>. See <xref linkend="ansi.streams.formats"/>.</para>
|
|
</section>
|
|
|
|
<xi:include href="ref_c_characters.xml" xpointer="ansi.characters.c-dict" xmlns:xi="http://www.w3.org/2001/XInclude"/>
|
|
</chapter>
|
|
</book> |