简体   繁体   English

比较 Node.js 中的两个 RegEx 对象

[英]Comparing two RegEx objects in Node.js

I'm using NodeRED to perform some logic on a string which has been created from image analysis (OCR) on Microsoft Azure Cognitive Services.我正在使用 NodeRED 对从 Microsoft Azure 认知服务上的图像分析 (OCR) 创建的字符串执行一些逻辑。 The image analysis doesn't allow for any pattern matching / input pattern.图像分析不允许任何模式匹配/输入模式。

The resulting string (let's call it 'A') sometimes interprets characters slightly incorrectly, typical things like 'l' = '1' or 's' = '5'.生成的字符串(我们称它为“A”)有时会略微错误地解释字符,典型的是“l”=“1”或“s”=“5”。

The resulting string can be one of only a few different formats, for argument sake lets say:生成的字符串可以是仅有的几种不同格式之一,为了争论起见,可以说:

  1. [az]{4,5} [阿兹]{4,5}
  2. [ag]{3}[0-9]{1,2} [广告]{3}[0-9]{1,2}
  3. [0-9][az]{4} [0-9][az]{4}

What I need to do is determine which format the intepretted string ('A') most closely aligns to ('1', '2' or '3').我需要做的是确定解释后的字符串 ('A') 最接近于('1'、'2' 或 '3')的格式。 Once I establish this, I was planning to adjust the misinterpretted characters and hopefully be left with a string that is (near) perfect.一旦确定了这一点,我就计划调整被误解的字符,并希望留下一个(接近)完美的字符串。

My initial plan was to convert 'A' into RegEx - so if 'A' came back as "12345", I would change this to a RegEx object [1|l][2|z]34[5|s], compare this object to the RegEx objects and hopefully one would come back as a match.我最初的计划是将“A”转换为 RegEx - 所以如果“A”返回为“12345”,我会将其更改为 RegEx object [1|l][2|z]34[5|s],比较这个 object 到 RegEx 对象,希望一个会作为匹配返回。

In reality, the interpretted string is more like 8 alphanumeric and five different (fairly complex) RegEx possibilities, but I've tried to simplify the problem for the purposes of this question.实际上,解释后的字符串更像是 8 个字母数字和五个不同的(相当复杂的)RegEx 可能性,但为了这个问题的目的,我试图简化问题。

So the question: is it possible to compare RegEx in this way?所以问题是:是否可以通过这种方式比较 RegEx? Does anyone have any other suggestions on how this image analysis could be improved?有没有人对如何改进这种图像分析有任何其他建议?

Thanks谢谢

Here is a solution using a Cartesian product to compare a string for possible matches.这是一个使用笛卡尔积来比较字符串以寻找可能匹配项的解决方案。 Test string is 'abclz' , which could match pattern1 or pattern2 :测试字符串是'abclz' ,可以匹配pattern1pattern2

 const cartesian = (...a) => a.reduce((a, b) => a.flatMap(d => b.map(e => [d, e].flat()))); const charMapping = { '1': ['1','l'], 'l': ['1','l'], '2': ['2','z'], 'z': ['2','z'], '5': ['5','s'], 's': ['5','s'] }; const buckets = { pattern1: /^[az]{4,5}$/, pattern2: /^[ag]{3}[0-9]{1,2}$/, pattern3: /^[0-9][az]{4}$/ }; const input = 'abclz'; console.log('input:', input); let params = input.split('').map(c => charMapping[c] || [c]); let toCompare = cartesian(...params).map(arr => arr.join('')); console.log('toCompare:', toCompare); let potentialMatches = toCompare.flatMap(str => { return Object.keys(buckets).map(pattern => { let match = buckets[pattern].test(str); console.log(str, pattern + ':', match); return match? str: null; }).filter(Boolean); }); console.log('potentialMatches:', potentialMatches);

Output: Output:

input: abclz
toCompare: [
  "abc12",
  "abc1z",
  "abcl2",
  "abclz"
]
abc12 pattern1: false
abc12 pattern2: true
abc12 pattern3: false
abc1z pattern1: false
abc1z pattern2: false
abc1z pattern3: false
abcl2 pattern1: false
abcl2 pattern2: false
abcl2 pattern3: false
abclz pattern1: true
abclz pattern2: false
abclz pattern3: false
potentialMatches: [
  "abc12",
  "abclz"
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM