简体   繁体   English

Autocad使用哪种无大小写的匹配算法来比较图层名称?

[英]What caseless matching algorithm does Autocad use to compare layer names?

Autocad DXF and DWG files use unicode strings to identify layers. Autocad DXF和DWG文件使用unicode字符串标识图层。 I've determined experimentally that Autocad must employ some sort of case folding and normalisation (Autocad considers 'groß' and 'GROSS' to be the same, and 'Am\\U+00e9lie' and 'Ame\\U+0301lie' to be the same). 我已经实验确定Autocad必须采用某种大小写折叠和规范化(Autocad认为'groß'和'GROSS'是相同的,而'Am \\ U + 00e9lie'和'Ame \\ U + 0301lie'是相同)。 I'd like to know in my own software if two layer names are the same according to Autocad. 根据Autocad,我想在自己的软件中知道两个图层名称是否相同。 Default Caseless Matching algorithm from the Unicode standard seems to give me the right answer but I'd like to be sure. Unicode标准的默认无大小写匹配算法似乎为我提供了正确的答案,但我想确定一下。

  1. Can anyone conform that Default Caseless Matching is the algorithm used by Autocad? 谁能证明Autocad所使用的算法是Default Caseless Matching? Or if it isn't what is. 或者,如果不是的话。

  2. Are there test inputs I can use to distinguish between different caseless matching algorithms? 是否可以使用测试输入来区分不同的无大小写匹配算法?

I don't have a definite answer, but the Unicode standard defines four algorithms for caseless matching: 我没有明确的答案,但是Unicode标准定义了四种用于无大小写匹配的算法:

  1. Default Caseless Matching (D144): This only uses (full) case folding but no normalization. 默认无大小写匹配(D144):这仅使用(完整)大小写折叠,但不进行规范化。 Since you mentioned that Am\\U+00e9lie and Ame\\U+0301lie match, this variant can definitely be ruled out. 由于您提到Am\\U+00e9lieAme\\U+0301lie匹配,因此可以肯定地排除此变体。

  2. Canonical caseless matching (D145): This uses (standard NFC or NFD) normalization in addition to case folding. 规范无大小写匹配(D145):除了折叠大小写外,它还使用(标准NFC或NFD)规范化。

  3. Compatibility caseless matching (D146): This uses the "compatibility" (NFKC or NFKD) normalization form in addition to case folding. 兼容性无大小写匹配(D146):除大小写折叠外,它还使用“兼容性”(NFKC或NFKD)规范化形式。

  4. Identifier caseless matching (D147): Like compatibility caseless matching but also ignores Default Ignorable characters. 标识符不区分大小写的匹配(D147):与兼容性不区分大小写的匹配一样,但也忽略默认的可忽略字符。

So I'd suggest the following additional tests: 因此,我建议进行以下其他测试:

  • If \\U+0133 (LATIN SMALL LIGATURE IJ with a compatibility mapping) and ij match, then Autocad seems to use compatibility normalization and canonical caseless matching (D145) can be ruled out. 如果\\U+0133 (具有兼容性映射的拉丁文小字体IJ)和ij匹配,则Autocad似乎使用了兼容性归一化,并且可以排除规范的无大小写匹配(D145)。

  • If A\\U+00adB (SOFT HYPHEN with property Default_Ignorable_Code_Point) and AB match, then Autocad seems to ignore Default Ignorable characters and compatibility caseless matching (D146) can be ruled out. 如果A\\U+00adB (具有属性Default_Ignorable_Code_Point的SOFT HYPHEN)和AB匹配,则Autocad似乎忽略了默认可忽略字符,并且可以排除不区分大小写的匹配(D146)。

It's of course possible that Autocad uses neither of the Unicode algorithms, but the tests above should help to narrow it down. Autocad当然有可能不使用Unicode算法,但是上面的测试应有助于缩小范围。 Please consider to post any additional findings to help other users. 请考虑发布任何其他发现以帮助其他用户。

我拦截了api调用,发现Windows上的Autocad 2018使用CompareStringW(LOCALE_USER_DEFAULT, NORM_IGNORECASE | SORT_STRINGSORT, ...)检查图层名称是否相等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM