Commit graph

12 commits

Author SHA1 Message Date
Daniel Kochmański
41f52d8d0f streams: bivalent stream signals a condition for bytes out of range
Sometimes a byte may be not within the character code range. In that case, when
we read the char, the system will signal a condition.

Alternatively (and that's the behavior before this commit) we could return the
character #\Nul. That was done by virtue of ECL_CHAR_CODE skipping tag bytes, so
the returned NIL was treated as 0.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
43fef5fad8 streams: address a possible segfault in sequence streams
Byte streams transcoding to :ucs-2 and :ucs-4 don't call ecl_set_stream_elt_type
effectively not initializing .byte_buffer.  Moreover functions seq_in_read_byte8
and seq_out_write_byte8 assume the vector type to be an octet based, and they
increment the stream position and test for its limit according to that.

That means that ecl_binary_read_byte and ecl_binary_write_byte calls would
segfault when seq_in_read_byte8 and seq_out_write_byte8 are called.

Both conditions could be easily mitigated by initializing .byte_buffer manually
and fixing seq_*_*_byte8 functions to account for the byte size, but there is no
need for that, because for these streams we are not using

ecl_binary_*_byte
ecl_eformat_*_byte

so byte8 functions are not called and .byte_buffer is not used.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
26a22057e5 streams: introduce direct bivalent sequence streams
Previously sequence streams always needed to go through the eformat and binary
encoders and decoders -- if bytes were too big, then we couldn't create sequence
streams from them.

After this commit it is possible to pass a character stream or a byte stream and
use it as a bivalent stream without a roundtrip for encoding and decoding.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
b7eaf35502 streams: move byte_stack to strm_os and improve UNREAD-BYTE
The .byte_stack is used only by files to:
a) unread a single octet when we use fallback LISTEN implementation
b) unread bytes that make a character when UNREAD-CHAR is used

The latter is important to transcode characters from one external format to
another (i.e see the test external-format.0003-transcode-read-char).

This commit improves the function unread-byte to do the same brinding bivalent
streams almost to parity with regard to that implementation (see next commit).

That makes the implementation of eformat cleaner, .byte_stack more
self-contained, and saves us consing new byte stack for sequence streams (where
it was simply ignored, not to mention not entirely correct - because we've used
a .byte_stack length to decrement the pointer position while the byte could have
more bits than one octet).

Other optimizations that could be done here:
- make the byte stack an adjustable vector to avoid consing on each unread
2025-08-11 10:01:40 +02:00
Daniel Kochmański
ca845457f8 streams: switch to the new binary reader/writer implementation
We drop warying generic-read/write variants in favor of using binary encoders
introduced in earlier commits.

This will allow for unified handling of unread bytes and characters and
transcoding both in bivalent streams.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
c7f534771a streams: sequence input stream follows the vector length
This is to allow working with sequence streams where the vector may change after
the stream has been created.

When the user specifies :END to be some fixed value, then we upkeep that
promise, but when :END is NIL, then we always consult the vector fillp.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
086f0a4bef streams: allow for sequence streams to handle all byte arrays
Previously when we couldn't convert the vector element type to a character,
creating sequence streams failed even when we were expecting the binary stream.
From now on it is possible to vectors with upgraded types being any integer.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
ea11e2c433 streams: rename common sequence stream accessors
SEQ_{INPUT,OUTPUT}* -> SEQ_STREAM*

Don't use IO_STREAM_ELT_TYPE in sequences and define SEQ_STREAM_ELT_TYPE
instread to avoid ambiguity.

This is a cleanup that signifies similarities between both objects.
2025-08-11 10:01:40 +02:00
Daniel Kochmański
a8e57c60a5 streams: implement new interfaces for unreading and peeking bytes
ecl_file_ops has two new members:

  void (*unread_byte)(cl_object strm, cl_object byte);
  cl_object (*peek_byte)(cl_object strm);

C API additions:

  void ecl_unread_byte (cl_object byte, cl_object strm)
  cl_object ecl_peek_byte (cl_object strm)

  si_unread_byte(cl_object strm, cl_object byte)    [1]
  si_peek_byte(cl_object strm, cl_object byte)      [2]

Lisp API additions:

  (ext:unread-byte stream byte) :: integer          [1]
  (ext:peek-byte   stream byte) :: (or integer nil) [2]

  (gray:stream-unread-byte stream byte) :: null
  (gray:stream-peek-byte stream) :: (or integer :eof)

We implement a "generic" version of unread-byte by storing it in a new slot
last_byte.
2025-08-11 10:01:37 +02:00
Daniel Kochmański
407fe456fe streams: make ecl_read_byte return OBJNULL on EOF
This is to allow for sequence streams to return arbitrary objects (when
appropriately constructed) without many changes.
2025-08-11 10:01:37 +02:00
Daniel Kochmański
431132e4d1 streams: ecl_file_ops cleanup and some minor fixes
1. ecl_peek_char had outdated comment presumbly from before we've introduced
   stream dispatch tables - that comment has been removed.

2. fix erroneous specializations

   - of STREAM-UNREAD-CHAR

   By mistake we had two methods specialized to ANSI-STREAM, while one were
   clearly meant to specialize to T (in order to call BUG-OR-ERROR).

   - of winsock winsock_stream_output_ops

     stream peek char was set to ecl_generic_peek_char instead of
     ecl_not_input_read_char

3. change struct ecl_file_ops definition

a) ecl_file_ops structure change order of some operations to always feature READ
   before WRITE (for consistency)

b) we are more precise in dispatch function declaration and specify the return
   type to be ecl_character where applicable
2025-08-11 10:01:37 +02:00
Daniel Kochmański
6ce9c22dda stream: split file.d into different stream types
This commit splits one garguntulum file into numerous orthogonal stream types:

- strm_os -- c99/posix/windows streams
- strm_clos -- gray streams
- strm_string -- string streams
- strm_composite -- composite streams (echo, broadcast, synonym ...)
- strm_common -- common errors, byte manipulation routines etc
- strm_sequence -- si_write_sequence and si_read_sequence (fast I/O)
- strm_eformat -- external format processing routines (unicode etc)

After this split file.d contains only open/close. This will be the place to
dispatch to a virtual filesystem.
2025-07-26 16:59:42 +02:00