简体   繁体   English

Excel VBA中的RegEx匹配扩展ASCII字符不正确

[英]RegEx in Excel VBA Matching Extended ASCII Chars Improperly

I'm trying to remove all non-printable and non-ASCII (extended) characters using the following RegEx in Excel VBA: 我正在尝试使用Excel VBA中的以下RegEx删除所有不可打印和非ASCII(扩展)字符:

[^\x09\0A\0D\x20-\xFF]

This should theoretically match anything that's not a tab, linefeed, carriage return or printable ASCII character (character code between hex 20 and FF or dec 32 and 255). 理论上,这应该匹配任何不是制表符,换行符,回车符或可打印ASCII字符(十六进制20和FF之间或十二进制和二十五之间的字符代码)。 I have confirmed here that Microsoft VBScript regular expressions support the \\xCC notation where CC is an ASCII code in hexadecimal. 我已确认这里是微软的VBScript正则表达式支持\\ XCC符号,其中CC是十六进制的ASCII码。

The problem is that this regex is matching every character above 127. It's then throwing an "invalid procedure call" on match.value when the matching character's code is above 127. Is it just that VBScript RegExes don't support character codes above 127? 问题是这个正则表达式匹配127以上的每个字符。当匹配字符的代码高于127时,它会在match.value上抛出“无效的过程调用”。难道只是VBScript RegExes不支持高于127的字符代码吗? I can't seem to find this data anywhere. 我似乎无法在任何地方找到这些数据。 Here's the full code: 这是完整的代码:

regEx.Pattern = "[^\x09\0A\0D\x20-\xFF]"
regEx.IgnoreCase = True 'True to ignore case
regEx.Global = True 'True matches all occurances, False matches the first occurance
regEx.MultiLine = True
If regEx.Test(Cells(curRow, curCol).Value) Then
    Set matches = regEx.Execute(Cells(curRow, curCol).Value)
    numReplacements = numReplacements + matches.Count
    For matchNum = matches.Count To 1 Step -1
        Cells(numReplacements - matchNum + 2, 16).Value = matches.Item(matchNum).Value
        Cells(numReplacements - matchNum + 2, 17).Value = Asc(matches.Item(matchNum).Value)
    Next matchNum
    Cells(curRow, curCol).Value = regEx.Replace(Cells(curRow, curCol).Value, replacements(pattNo))
End If

The first character it matches is 0x96 (&ndash). 它匹配的第一个字符是0x96(&ndash)。 I can see it in the "Watches" window when I watch "matches" and expand it. 当我观看“匹配”并展开它时,我可以在“手表”窗口中看到它。 However, when I try to watch matches.Item(matchNum).Value I get (see screenshot). 但是,当我尝试观看matches.Item(matchNum).Value我得到(见截图)。 Any ideas? 有任何想法吗?

Microsoft VBScript regular expressions support the \\xCC notation where CC is an ASCII code in hexadecimal Microsoft VBScript正则表达式支持\\ xCC表示法,其中CC是十六进制的ASCII代码

Note that ASCII is defined from \\x00 to \\x7F, where printable ASCII characters are from \\x20 to \\x7E. 请注意,ASCII是从\\ x00到\\ x7F定义的,其中可打印的ASCII字符是从\\ x20到\\ x7E。

Codes \\x80 and above are Ansi, not ASCII. 代码\\ x80及以上是Ansi,而不是ASCII。

Try next: 试试下一个:

Dim ii, sExPatern: sExPatern = "[^\x09\x0A\x0D\x20-\x7E\"
For ii = 128 To 255
  sExPatern = sExPatern & Chr( ii)
Next
sExPatern = sExPatern & "]"
'...
regEx.Pattern = sExPatern

Honestly, I'm not sure on pritability of some codes, eg 129, 131, 136, 144, 152, 160 in decimal (my Ansi code page is "Windows Central Europe", so you may consider more detailed examination) 老实说,我不确定某些代码的可保存性,例如十进制129,131,136,144,152,160(我的Ansi代码页是“Windows Central Europe”,所以你可以考虑更详细的检查)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM