简体   繁体   English

vim中的正则表达式unicode字符

[英]regex unicode character in vim

I'm being an idiot.我是个白痴。

Someone cut and pasted some text from microsoft word into my lovely html files.有人从 microsoft word 中剪切并粘贴了一些文本到我可爱的 ​​html 文件中。

I now have these unicode characters instead of regular quote symbols, (ie quotes appear as <92> in the text)我现在有这些 un​​icode 字符而不是常规引号符号,(即引号在文本中显示为 <92>)

I want to do a regex replace but I'm having trouble selecting them.我想做一个正则表达式替换,但我无法选择它们。

:%s/\u92/'/g
:%s/\u5C/'/g
:%s/\x92/'/g
:%s/\x5C/'/g

...all fail. ……都失败了。 My google-fu has failed me.我的 google-fu 失败了。

From :help regexp (lightly edited), you need to use some specific syntax to select unicode characters with a regular expression in Vim::help regexp (稍微编辑),您需要使用一些特定的语法在 Vim 中使用正则表达式选择 unicode 字符:

\%u match specified multibyte character (eg \%u20ac)

That is, to search for the unicode character with hex code 20AC, enter this into your search pattern:也就是说,要搜索十六进制代码 20AC 的 unicode 字符,请将其输入到您的搜索模式中:

\%u20ac

The full table of character search patterns includes some additional options:完整的字符搜索模式表包括一些附加选项:

\%d match specified decimal character (eg \%d123)
\%x match specified hex character (eg \%x2a)
\%o match specified octal character (eg \%o040)
\%u match specified multibyte character (eg \%u20ac)
\%U match specified large multibyte character (eg \%U12345678)

This solution might not address the problem as originally stated, but it does address a different but very closely related one and I think it makes a lot of sense to place it here.这个解决方案可能没有像最初所说的那样解决问题,但它确实解决了一个不同但非常密切相关的问题,我认为把它放在这里很有意义。

I don't know in which version of Vim it was implemented, but I was working on 7.4 when I tried it.我不知道它是在哪个版本的 Vim 中实现的,但我在尝试时正在 7.4 上工作。

When in Edit mode, the sequence to output unicode characters is: ctrl-v u xxxx where xxxx is the code point.在编辑模式下,输出 unicode 字符的顺序是: ctrl-v u xxxx其中xxxx是代码点。 For instance outputting the euro sign would be ctrl-v u 20ac .例如,输出欧元符号将是ctrl-v u 20ac

I tried it in Command mode as well and it worked.我也在命令模式下尝试过它并且有效。 That is, to replace all instances of "20 euro" in my document with "20 €", I'd do:也就是说,要将文档中所有“20 欧元”的实例替换为“20 欧元”,我会这样做:

:%s/20 euro/20 <ctrl-v u 20ac>/gc

In the above <ctrl-v u 20ac> is not literal, it's the sequence of keys that will output the character.在上面的<ctrl-v u 20ac>不是文字,而是将输出字符的键序列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM