Why doesn't IO::Socket::Async's emit a trailing "a"?

Question

I was wondering if anyone knows how to get around the encoding of IO::Socket::Async, particularly the draw-backs described by this :

For example, if the UTF-8 encoding is being used and the last byte in the packet decoded to "a", this would not be emitted since the next packet may include a combining character that should form a single grapheme together. Control characters (such as \n) always serve as grapheme boundaries, so any text-based protocols that use newlines or null bytes as terminators will not need special consideration.

This is currently causing my sockets to omit the last character on messages, but I am not sure how to work around this. I tried to convert the Connection to a Channel then just feed a dumby \n into it, simulating end of input for the message, but that did not work. How can I work around this quirk in UTF-8 encoding?

Here is the MVP to reproduce this:

sub listen(Int $port) {
  react {
    whenever IO::Socket::Async.listen('0.0.0.0', $port) -> $connection {
      whenever $connection.Supply -> $data {
        say $data;
        $connection.print: $data;
      }
    }
  }
}

listen(9999);

Now if you hit port 9999 on your local machine with any data that does not end with \n you will see that the last byte is ignored.

Answer 1

It's not a "drawback"; it's just Raku reflecting how Unicode works. If you know you only need to handle ASCII or Latin-1, then specify that:

whenever $connection.Supply(:enc<ascii>) -> $data { # or :enc<latin-1>
    ...
}

If wanting to handle Unicode text, then it's necessary to deal with that fact that receiving, for example, the codepoint for the letter "a", does not give enough information to pass along a complete character, since the next codepoint received in the next packet might be a combining character, such as an acute accent to be placed on the "a". Note that a Raku Str is a character-level data structure (in other languages, strings are often bytes or codepoints, which creates different problems that are largely invisible to those only caring about English text!)

Any well-designed.network protocol will provide a way to know when the end of the text content has been reached. Some protocols, such as HTTP, explicitly specify the byte length of the content, thus one can work a the byte level ( :bin ) and decode the result after seeing that many bytes. Others might use connection close or line breaks.

In conclusion, the string semantics or IO::Socket::Async (and elsewhere in Raku) aren't themselves a problem, but they may show up design problems in protocols.

Why doesn't IO::Socket::Async's emit a trailing "a"?

Question

1 answers

solution1
7 ACCPTED 2023-01-09 00:34:29

Why doesn't IO::Socket::Async's emit a trailing "a"?

Question

1 answers

solution1 7 ACCPTED 2023-01-09 00:34:29

solution1
7 ACCPTED 2023-01-09 00:34:29