Let's combine a regular i
with a combining acute accent, and normalize the result (using Python's unicodedata.normalize
):
from unicodedata import normalize
normalize("NFC", "i\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace")
b'\\N{LATIN SMALL LETTER I WITH ACUTE}'
As expected: a small i
with the dot swapped out for an acute accent, í
.
Let's do the same with a dotless i:
from unicodedata import normalize
normalize("NFC", "\N{LATIN SMALL LETTER DOTLESS I}\N{COMBINING ACUTE ACCENT}").encode("ascii", "namereplace")
b'\\N{LATIN SMALL LETTER DOTLESS I}\\N{COMBINING ACUTE ACCENT}'
As you can see, it does not combine. Other implementations, eg, this one , do the same.
Why not? Is this consistent with the Unicode standard ?
From The Unicode Standard, Version 14.0 , Diacritics on i and j (highlighting by myself):
A dotted (normal) i or j followed by some common nonspacing marks above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ¨ ≠ ı + ¨).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.