简体   繁体   English

如何比较看起来相似但字符不同的字符串?

[英]How to compare strings in which appears similar characters but different char codes?

I have the problem with comparing strings with different char codes but similar characters like the following: 我在比较具有不同字符代码但类似如下字符的字符串时遇到问题:

console.log('³' === '3') // false;

False value from the code above because of different char codes: 上面代码中的错误值,因为字符代码不同:

console.log('³'.charCodeAt(0)) // 179
console.log('3'.charCodeAt(0)) // 51

What is a universal solution to convert values to be equals? 将值转换为相等的通用解决方案是什么? I need it because I need to compare all numbers like 1,2,3,4,5.... 我需要它,因为我需要比较所有数字1,2,3,4,5....

Thanks 谢谢

Look into ASCII folding, which is primarily used to convert accented characters to unaccented ones. 查看ASCII折叠,它主要用于将重音字符转换为无重音字符。 There's a JS library for it here . 有一个JS库,它在这里

For your provided example, it will work - for other examples, it might not. 对于您提供的示例,它将起作用-对于其他示例,则可能不起作用。 It depends on how the equivalence is defined (nobody but you knows what you mean by "similar" - different characters are different characters). 这取决于等效性的定义方式(没人知道,但是您知道“相似”是什么意思-不同的字符就是不同的字符)。

If you know all of the characters that you want to map already, the easiest way will simply be to define a mapping yourself: 如果您已经知道要映射的所有字符,那么最简单的方法就是自己定义一个映射:

var eqls = function(first, second) {
    var mappings = { '³': '3', '3': '3' };

    if (mappings[first]) {
        return mappings[first] == mappings[second];
    }

    return false;
}

if (eqls('³', '3')) { ... }

There is no "universal solution" 没有“通用解决方案”

If you've only to deal with digits you may build up your "equivalence table" where for each supported character you define a "canonical" character. 如果只需要处理数字,则可以建立“等效表”,在其中为每个受支持的字符定义一个“规范”字符。

For example 例如

var eqTable = []; // the table is just an array

eqTable[179] = 51; // ³ --> 3
/* ... */

Then build a simple algorythm to turn a string into its canonical form 然后构建一个简单的算法,将字符串转换为规范形式

var original,         // the source string - let's assume original=="³3"
var canonical = "";   // the canonical resulting string

var i,
    n,
    c;

n = original.length;
for( i = 0; i < n; i++ )
{
    c = eqTable[ original.charCodeAt( i ) ];
    if( typeof( c ) != 'undefined' )
    {
        canonical += String.fromCharCode( c );
    }
    else
    {
        canonical += original[ i ]; // you *may* leave the original character if no match is found
    }
}

// RESULT: canonical == "33"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM