Sometimes a byte may be not within the character code range. In that case, when
we read the char, the system will signal a condition.
Alternatively (and that's the behavior before this commit) we could return the
character #\Nul. That was done by virtue of ECL_CHAR_CODE skipping tag bytes, so
the returned NIL was treated as 0.
Byte streams transcoding to :ucs-2 and :ucs-4 don't call ecl_set_stream_elt_type
effectively not initializing .byte_buffer. Moreover functions seq_in_read_byte8
and seq_out_write_byte8 assume the vector type to be an octet based, and they
increment the stream position and test for its limit according to that.
That means that ecl_binary_read_byte and ecl_binary_write_byte calls would
segfault when seq_in_read_byte8 and seq_out_write_byte8 are called.
Both conditions could be easily mitigated by initializing .byte_buffer manually
and fixing seq_*_*_byte8 functions to account for the byte size, but there is no
need for that, because for these streams we are not using
ecl_binary_*_byte
ecl_eformat_*_byte
so byte8 functions are not called and .byte_buffer is not used.
Previously sequence streams always needed to go through the eformat and binary
encoders and decoders -- if bytes were too big, then we couldn't create sequence
streams from them.
After this commit it is possible to pass a character stream or a byte stream and
use it as a bivalent stream without a roundtrip for encoding and decoding.
This finishes the commit that adds unread-byte and peek-byte functions to the
mix in that for bivalent stream UNREAD-BYTE will work for the subsequent
READ-CHAR and vice versa. This also caters to transcoding etc.
The .byte_stack is used only by files to:
a) unread a single octet when we use fallback LISTEN implementation
b) unread bytes that make a character when UNREAD-CHAR is used
The latter is important to transcode characters from one external format to
another (i.e see the test external-format.0003-transcode-read-char).
This commit improves the function unread-byte to do the same brinding bivalent
streams almost to parity with regard to that implementation (see next commit).
That makes the implementation of eformat cleaner, .byte_stack more
self-contained, and saves us consing new byte stack for sequence streams (where
it was simply ignored, not to mention not entirely correct - because we've used
a .byte_stack length to decrement the pointer position while the byte could have
more bits than one octet).
Other optimizations that could be done here:
- make the byte stack an adjustable vector to avoid consing on each unread
Previously we've stored in this field the last read char, while now we store
there the last unread char. This way we can't tell whether the last read char
was the same as the unread one, but on the other hand this way requires less
bookeeping and the code shape is similar to UNREAD-BYTE.
It was used to store bytes for unread, but we are going to change how unread
works, and we still can simply test for newline and encode behavior directly in
unread-char for newlines.
Instead of remembering the last unread object and its type, it simply yots down
the fact that something has been unread (and clears on read), and delegates the
question to the input stream.
We drop warying generic-read/write variants in favor of using binary encoders
introduced in earlier commits.
This will allow for unified handling of unread bytes and characters and
transcoding both in bivalent streams.
The byte buffer is used for encoding and decoding both characters and bytes.
Previously we've used a stack-allocated array, but this doesn't cut it when it
comes to binary streams, where the byte may be a "finite recognizable subtype of
integer" (c.f specification of OPEN), because then the array may have more
elements.
This will allow us to transcode characters to bytes and vice versa. This is
necessary to implement conductive UNREAD-BYTE and UNREAD-BYTE, but will allow us
to also add low-level parsers for binary objects in the future.
This is to allow working with sequence streams where the vector may change after
the stream has been created.
When the user specifies :END to be some fixed value, then we upkeep that
promise, but when :END is NIL, then we always consult the vector fillp.
Previously when we couldn't convert the vector element type to a character,
creating sequence streams failed even when we were expecting the binary stream.
From now on it is possible to vectors with upgraded types being any integer.
SEQ_{INPUT,OUTPUT}* -> SEQ_STREAM*
Don't use IO_STREAM_ELT_TYPE in sequences and define SEQ_STREAM_ELT_TYPE
instread to avoid ambiguity.
This is a cleanup that signifies similarities between both objects.
1. ecl_peek_char had outdated comment presumbly from before we've introduced
stream dispatch tables - that comment has been removed.
2. fix erroneous specializations
- of STREAM-UNREAD-CHAR
By mistake we had two methods specialized to ANSI-STREAM, while one were
clearly meant to specialize to T (in order to call BUG-OR-ERROR).
- of winsock winsock_stream_output_ops
stream peek char was set to ecl_generic_peek_char instead of
ecl_not_input_read_char
3. change struct ecl_file_ops definition
a) ecl_file_ops structure change order of some operations to always feature READ
before WRITE (for consistency)
b) we are more precise in dispatch function declaration and specify the return
type to be ecl_character where applicable
The function operates on base_string while if it was supplied with an extended
string then ecl_base_char array became ecl_character, and that lead to bad
copies. To fix it we ensure that the passes string is first coerced to cstring.
This commit splits one garguntulum file into numerous orthogonal stream types:
- strm_os -- c99/posix/windows streams
- strm_clos -- gray streams
- strm_string -- string streams
- strm_composite -- composite streams (echo, broadcast, synonym ...)
- strm_common -- common errors, byte manipulation routines etc
- strm_sequence -- si_write_sequence and si_read_sequence (fast I/O)
- strm_eformat -- external format processing routines (unicode etc)
After this split file.d contains only open/close. This will be the place to
dispatch to a virtual filesystem.
At least in two from four cases continuing from the error lead to an error:
- for concatenated stream we've tried to dispatch on CAR that was NIL (segfault)
- for string stream we've decremented the position below 0
Also change these functions to defined macros ecl_unread_* that expand to
FEerror (in internal.h). This is in anticipation of splitting file.d.
Atomics are needed by stacks.
Replace ecl_atomic_push -> ecl_atomic_psh that takes as an argument a
preallocated cons. ecl_atomic_push is replaced with a macro.
We've hit qutie fast 3/4 of the limit, so this pull request limits pipelines to
be run only when we commit to the branch develop and on merge requests.
On OpenBSD, FILE is opaque (starting from upcoming OpenBSD 7.8).
FILE_CNT() macro is implementable using `size_t __freadahead(FILE *stream)` function (provided for gnulib compat).
With recent versions of the android NDK, there is no need to create a
separate toolchain anymore. Moreover, the LDFLAGS and CPPFLAGS are
obsolete. Also, we can't call the C compiler with a `-Dandroid`
argument as we used to since that conflicts with uses of the `android`
identifier within the header files of the android C standard library.
Therefore, we change from `thehost=android` to `thehost=ANDROID` in
uppercase. Finally, make the test script run more reliably by starting
an adb server before we change TMPDIR.
This is needed to allow for cross compiling from a compiler with a
different set of configure options (e.g. compiling for a target which
doesn't support complex floats from a host which does).