简体   繁体   English

Unicode连字SS

[英]Unicode ligature SS

I am parsing an XML document containing characters in the private area of the Sabon font. 我正在解析一个XML文档,其中包含Sabon字体的私有区域中的字符。 These characters have to be replaced because the font has to be changed to Times New Roman. 因为字体必须更改为Times New Roman,所以必须替换这些字符。 So far, everything is fine. 到目前为止,一切都很好。

Now I need a replacement for a character which looks like SS (double s, like a ligature of two s). 现在,我需要替换一个看起来像SS的字符(双精度s,如两个s的连字)。 I inspected Times and didn't find a corresponding char. 我检查了《泰晤士报》,但没有找到相应的字符。 Does someone know whether there is such a thing in unicode? 有人知道unicode中是否存在这种情况?

This is a bit of a mystery, but I think that the glyph that you are seeing is a small capital glyph for “ß” U+00DF LATIN SMALL LETTER SHARP S, often called “German double s”. 这有点神秘,但我认为您看到的标志符号是“ß” U + 00DF拉丁小写字母SHARP S(通常称为“ German double s”)的小写字母标志。 For the word you mention in a comment, this would make little sense, because Broussonet was a French naturalist, and French does not use “ß” (and German does not use “ß” for foreign names), so the few occurrences of “Broußonet” that Google finds must be odd misspellings. 对于您在评论中提到的单词,这没有什么意义,因为Broussonet是法国的自然主义者,并且法语不使用“ß”(德语不使用“ß”作为外来名称),因此很少出现“ Google发现“Broußonet”必须是奇怪的拼写错误。

The copied string contains Private Use code points that Sabon seems to use for small capitals. 复制的字符串包含Sabon似乎用于小写字母的“私人使用”代码点。 This is somewhat weird, since normally small capitals are nowadays included as glyph variants selectable using OpenType features rather than Private Use code points, which are non-portable by definition. 这有点怪异,因为如今通常包括小写字母的大写字母都可以作为字形变体使用OpenType功能选择,而不是私有使用代码点(根据定义是不可移植的)。

This still does not explain what is happening, since the string contains “Broussonet” in that sense, with “ss” represented by two copies of the Private Use code point that is used for small caps “s” in Sabon. 这仍然无法解释正在发生的情况,因为该字符串在该意义上包含“ Broussonet”,其中“ ss”由两个专用代码点的副本表示,这些代码点用于Sabon中的小写字母“ s”。 Presumably, some conversion between “ss” and “ß” is taking place somewhere. 据推测,某处正在发生“ ss”和“ß”之间的转换。 Anyway, the “character” in your second comment is U+E03F, a Private Use code point apparently used for small caps “ß”, CFF glyph id germandbls.sc, in Sabon. 无论如何,第二个注释中的“字符”是U + E03F,这是一个专用代码点,显然用于Sabon中的小写“ß”(CFF字形是germandbls.sc)。

However, if the text is interpreted as really being in uppercase, with letters other than the first one represented using small caps, and if “SS” is then interpreted as or replaced by the uppercase form of “ß”, then it's “ẞ” U+1E9E LATIN CAPITAL LETTER SHARP S. In normal German orthography, “ß” maps to “SS” (two copies of normal letter “S”) in uppercasing, but nowadays Unicode also has U+1E9E, to meet the need to preserve differences in spelling, as in Strauss vs. Strauß, when names are written in all-uppercase. 但是,如果将文本解释为真的是大写字母,并且第一个字母以外的字母都用小写字母表示,并且如果将“ SS”解释为大写字母“ß”或由大写字母“ß”代替,则为“ẞ” U + 1E9E拉丁字母大写S.在正常的德国拼写法中,“ß”在大写字母中映射为“ SS”(两个普通字母“ S”),但如今Unicode也具有U + 1E9E,以满足保留的需要。名称全用大写时,拼写上的差异(例如Strauss与Strauß)。 Modern versions of Times New Roman have a glyph for “ẞ”, old versions don't (U+1E9E was added in Unicode version 5.1, in April 2008). Times New Roman的现代版本带有“ẞ”的标志符号,而旧版本则没有(2008年4月Unicode版本5.1中添加了U + 1E9E)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM