简体   繁体   中英

Line breaking behaviour of unicode characters (e.g. 🦄)?

Background

All the following are one line, and you should see different line-breaking behavior

🦄 w/equals

==============================🦄============================================================================================================

🦄 w/dots

..................................🦄............................................................................................................................................................................................................

⚠️ w/both for reference

==============================⚠️========================================================================================================================================================================================================================

..................................⚠️................................................................................................................................................................................................................................................................................

Questions:

  1. Why does the 🦄 break the line (while, the ⚠️ does not)?
  2. On Chrome 63/Safari 11.0 why does wrapping in "=" cause the 🦄 to stay on the top line, while wrapping in "." causes the 🦄 drop down to the second line?

Recreated in JSFiddle in the following container:

div {
  width: 200px;
  display: block;
  ...
}

It has to do with the characters' properties. Warning Sign and Equals Sign are in the line break category Alphabetic , Unicorn Face is in the category Ideographic , and Full Stop is in the category Infix_Numeric .

If we consult UAX #14: Unicode Line Breaking Algorithm , we can see that ideographic characters provide line break opportunities before and after, so lines are free to break around them. Meanwhile, alphabetic characters are supposed to “stick” together, so no line breaks should occur. Since ⚠ is alphabetic, it glues to the equals signs and the line simply breaks when there is no more room to expand. The ideographic 🦄 however allows line breaks, so the text wraps as soon as it hits the unicorn to allow more space for the equals signs, which cannot break.

Now, as to why full stop behaves differently than the equals sign: Infix numeric characters are supposed to glue to any numeric characters that directly follow them. Since that isn't the case here, another rule applies:

When not used in a numeric context, infix separators are sentence-ending punctuation. Therefore they always prevent breaks before.

This means that the line cannot break after 🦄 since the following full stop is supposed to glue to it, so it drops down to the start of the next line instead.

Keep in mind that most of these line break categories are tailorable. They are default values that may be very useful for most applications, but can be overridden if different behaviour is more desirable. In Firefox, for example, the line breaks before 🦄 in both the full stop and the equals sign example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM