mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-01-13 06:50:39 -08:00
Added commentary about the overall design and its limitations.
This commit is contained in:
parent
a22205d67c
commit
0cd7a14e57
1 changed files with 92 additions and 0 deletions
92
src/w32.c
92
src/w32.c
|
|
@ -1290,6 +1290,98 @@ w32_valid_pointer_p (void *p, int size)
|
|||
|
||||
|
||||
|
||||
/* Here's an overview of how the Windows build supports file names
|
||||
that cannot be encoded by the current system codepage.
|
||||
|
||||
From the POV of Lisp and layers of C code above the functions here,
|
||||
Emacs on Windows pretends that its file names are encoded in UTF-8;
|
||||
see encode_file and decode_file on coding.c. Any file name that is
|
||||
passed as a unibyte string to C functions defined here is assumed
|
||||
to be in UTF-8 encoding. Any file name returned by functions
|
||||
defined here must be in UTF-8 encoding, with only a few exceptions
|
||||
reserved for a couple of special cases. (Be sure to use
|
||||
MAX_UTF8_PATH for char arrays that store UTF-8 encoded file names,
|
||||
as they can be much longer than MAX_PATH!)
|
||||
|
||||
The UTF-8 encoded file names cannot be passed to system APIs, as
|
||||
Windows does not support that. Therefore, they are converted
|
||||
either to UTF-16 or to the ANSI codepage, depending on the value of
|
||||
w32-unicode-filenames, before calling any system APIs or CRT library
|
||||
functions. The default value of that variable is determined by the
|
||||
OS on which Emacs runs: nil on Windows 9X and t otherwise, but the
|
||||
user can change that default (although I don't see why would she
|
||||
want to).
|
||||
|
||||
The 4 functions defined below, filename_to_utf16, filename_to_ansi,
|
||||
filename_from_utf16, and filename_from_ansi, are the workhorses of
|
||||
these conversions. They rely on Windows native APIs
|
||||
MultiByteToWideChar and WideCharToMultiByte; we cannot use
|
||||
functions from coding.c here, because they allocate memory, which
|
||||
is a bad idea on the level of libc, which is what the functions
|
||||
here emulate. (If you worry about performance due to constant
|
||||
conversion back and forth from UTF-8 to UTF-16, then don't: first,
|
||||
it was measured to take only a few microseconds on a not-so-fast
|
||||
machine, and second, that's exactly what the ANSI APIs we used
|
||||
before do anyway, because they are just thin wrappers around the
|
||||
Unicode APIs.)
|
||||
|
||||
The variables file-name-coding-system and default-file-name-coding-system
|
||||
still exist, but are actually used only when a file name needs to
|
||||
be converted to the ANSI codepage. This happens all the time when
|
||||
w32-unicode-filenames is nil, but can also happen from time to time
|
||||
when it is t. Otherwise, these variables have no effect on file-name
|
||||
encoding when w32-unicode-filenames is t; this is similar to
|
||||
selection-coding-system.
|
||||
|
||||
This arrangement works very well, but it has a few gotchas:
|
||||
|
||||
. Lisp code that encodes or decodes file names manually should
|
||||
normally use 'utf-8' as the coding-system on Windows,
|
||||
disregarding file-name-coding-system. This is a somewhat
|
||||
unpleasant consequence, but it cannot be avoided. Fortunately,
|
||||
very few Lisp packages need to do that.
|
||||
|
||||
More generally, passing to library functions (e.g., fopen or
|
||||
opendir) file names already encoded in the ANSI codepage is
|
||||
explictly *verboten*, as all those functions, as shadowed and
|
||||
emulated here, assume they will receive UTF-8 encoded file names.
|
||||
|
||||
For the same reasons, no CRT function or Win32 API can be called
|
||||
directly in Emacs sources, without either converting the file
|
||||
name sfrom UTF-8 to either UTF-16 or ANSI codepage, or going
|
||||
through some shadowing function defined here.
|
||||
|
||||
. File names passed to external libraries, like the image libraries
|
||||
and GnuTLS, need special handling. These libraries generally
|
||||
don't support UTF-16 or UTF-8 file names, so they must get file
|
||||
names encoded in the ANSI codepage. To facilitate using these
|
||||
libraries with file names that are not encodable in the ANSI
|
||||
codepage, use the function ansi_encode_filename, which will try
|
||||
to use the short 8+3 alias of a file name if that file name is
|
||||
not encodable in the ANSI codepage. See image.c and gnutls.c for
|
||||
examples of how this should be done.
|
||||
|
||||
. Running subprocesses in non-ASCII directories and with non-ASCII
|
||||
file arguments is limited to the current codepage (even though
|
||||
Emacs is perfectly capable of finding an executable program file
|
||||
even in a directory whose name cannot be encoded in the curreent
|
||||
codepage). This is because the command-line arguments are
|
||||
encoded _before_ they get to the w32-specific level, and the
|
||||
encoding is not known in advance (it doesn't have to be the
|
||||
current ANSI codepage), so w32proc.c functions cannot re-encode
|
||||
them in UTF-16. This should be fixed, but will also require
|
||||
changes in cmdproxy. The current limitation is not terribly bad
|
||||
anyway, since very few, if any, Windows console programs that are
|
||||
likely to be invoked by Emacs support UTF-16 encoded command
|
||||
lines.
|
||||
|
||||
. For similar reasons, server.el and emacsclient are also limited
|
||||
to the current ANSI codepage for now.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
|
||||
/* Converting file names from UTF-8 to either UTF-16 or the ANSI
|
||||
codepage defined by file-name-coding-system. */
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue