[英]How can I make a regex match all words except for one word?
I have this text: 我有这段文字:
<tag>Value<tag>
and I want to convert it to 我想将其转换为
<%= Value %>
I was able to do it, using: 我能够做到,使用:
Regex.Replace(text, "<tag>(.*?)<tag>", "<%= $1 %>", RegexOptions.Compiled);
However, the text could contain this word "=\\n" anywhere in the text. 但是,文本可以在文本中的任何位置包含此单词“ = \\ n”。 for example:
例如:
<tag=\n>Value<tag>
<tag>Value<tag=\n>
<tag>Value=\n<tag>
<tag>=\nValue<tag>
<tag>Va=\nlue<tag>
<ta=\ng>Value<tag>
How can I modify my pattern to work? 如何修改我的模式才能正常工作?
A simple way out would be to remove =\\n
before passing your string to regex: 一种简单的方法是在将字符串传递给正则表达式之前删除
=\\n
:
Regex.Replace(text.Replace(@"=\n", ""), "<tag>([^<]*)<tag>", "<%= $1 %>", RegexOptions.Compiled);
Note that I also replaced the reluctant dot-asterisk .*?
请注意,我还替换了勉强的点星号
.*?
with [^<]*
to protect your expression from catastrophic backtracking . 与
[^<]*
可以保护您的表情免受灾难性的回溯 。
First, you simply can't do what you want to do reliably, consistently, or generally using regular expressions. 首先,您根本无法可靠,一致或通常使用正则表达式来完成您想做的事情。 For more information as to why you shouldn't parse SGML-derived markup languages with regular expressions, please see @bobince's definitive answer on parsing (X)HTML
有关为何不应该使用正则表达式解析SGML衍生标记语言的更多信息,请参见@bobince 关于解析(X)HTML的权威性答案
That out of the way, here's the regex you'd need to use. 顺便说一句,这是您需要使用的正则表达式。 Why?
为什么? Because there is no regex operator for "interspersed-between" (such an operator would not be possible in a regular language, as far as I know, so you'd need an entirely different model to write such a string recognizer).
因为没有用于“介于两者之间”的正则表达式运算符(据我所知,这种运算符在常规语言中是不可能的,因此您需要一个完全不同的模型来编写这样的字符串识别器)。
<(=\\n)?t(=\\n)?a(=\\n)?g(=\\n)?>(?<value>([^<]*))<(=\\n)?t(=\\n)?a(=\\n)?g(=\\n)?>
You'll have to change up your replace pattern a bit: 您必须稍微改变一下替换模式:
<%= ${value} %>
If you need to remove the "=\\n" (which seems like you're trying to process escaped text, which you should also never do: whatever weird escaping routines you have, unescape the text, process it, and escape it again if necessary), you'll not be able to do it in the same regex. 如果您需要删除“ = \\ n”(这似乎是您尝试处理转义的文本,那么您也永远不要这样做:无论您拥有哪种怪异的转义例程,请先对文本进行转义,然后对其进行转义,如果必要),则您将无法在同一正则表达式中执行此操作。 In fact, you'd probably need to go two passes through the text, once to grab each value for sanitization in procedural code, then once to re-insert the values at their appropriate places.
实际上,您可能需要遍历两次文本,一次在过程代码中获取每个值以进行清理,然后一次将值重新插入它们的适当位置。
TL;DR: Use a real XML parser if you want to "convert XML to ASP pages" (which appears to be your goal). TL; DR:如果要“将XML转换为ASP页面”(这似乎是您的目标),请使用真实的XML解析器。
Try this 尝试这个
Regex.Replace(text, "(=\\n)", "", RegexOptions.Compiled);
Regex.Replace(text, "<tag>(.*?)<tag>", "<%= $1 %>", RegexOptions.Compiled);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.