如何使正则表达式匹配一个单词以外的所有单词？

Question

I have this text: 我有这段文字：

<tag>Value<tag>

and I want to convert it to 我想将其转换为

<%= Value %>

I was able to do it, using: 我能够做到，使用：

Regex.Replace(text, "<tag>(.*?)<tag>", "<%= $1 %>", RegexOptions.Compiled);

However, the text could contain this word "=\\n" anywhere in the text. 但是，文本可以在文本中的任何位置包含此单词“ = \\ n”。 for example: 例如：

<tag=\n>Value<tag>
<tag>Value<tag=\n>
<tag>Value=\n<tag>
<tag>=\nValue<tag>
<tag>Va=\nlue<tag>
<ta=\ng>Value<tag>

How can I modify my pattern to work? 如何修改我的模式才能正常工作？

Answer 1

A simple way out would be to remove =\\n before passing your string to regex: 一种简单的方法是在将字符串传递给正则表达式之前删除=\\n ：

Regex.Replace(text.Replace(@"=\n", ""), "<tag>([^<]*)<tag>", "<%= $1 %>", RegexOptions.Compiled);

Note that I also replaced the reluctant dot-asterisk .*? 请注意，我还替换了勉强的点星号.*? with [^<]* to protect your expression from catastrophic backtracking . 与[^<]*可以保护您的表情免受灾难性的回溯 。

Answer 2

First, you simply can't do what you want to do reliably, consistently, or generally using regular expressions. 首先，您根本无法可靠，一致或通常使用正则表达式来完成您想做的事情。 For more information as to why you shouldn't parse SGML-derived markup languages with regular expressions, please see @bobince's definitive answer on parsing (X)HTML 有关为何不应该使用正则表达式解析SGML衍生标记语言的更多信息，请参见@bobince 关于解析（X）HTML的权威性答案

That out of the way, here's the regex you'd need to use. 顺便说一句，这是您需要使用的正则表达式。 Why? 为什么？ Because there is no regex operator for "interspersed-between" (such an operator would not be possible in a regular language, as far as I know, so you'd need an entirely different model to write such a string recognizer). 因为没有用于“介于两者之间”的正则表达式运算符（据我所知，这种运算符在常规语言中是不可能的，因此您需要一个完全不同的模型来编写这样的字符串识别器）。

<(=\\n)?t(=\\n)?a(=\\n)?g(=\\n)?>(?<value>([^<]*))<(=\\n)?t(=\\n)?a(=\\n)?g(=\\n)?>

You'll have to change up your replace pattern a bit: 您必须稍微改变一下替换模式：

<%= ${value} %>

If you need to remove the "=\\n" (which seems like you're trying to process escaped text, which you should also never do: whatever weird escaping routines you have, unescape the text, process it, and escape it again if necessary), you'll not be able to do it in the same regex. 如果您需要删除“ = \\ n”（这似乎是您尝试处理转义的文本，那么您也永远不要这样做：无论您拥有哪种怪异的转义例程，请先对文本进行转义，然后对其进行转义，如果必要），则您将无法在同一正则表达式中执行此操作。 In fact, you'd probably need to go two passes through the text, once to grab each value for sanitization in procedural code, then once to re-insert the values at their appropriate places. 实际上，您可能需要遍历两次文本，一次在过程代码中获取每个值以进行清理，然后一次将值重新插入它们的适当位置。

TL;DR: Use a real XML parser if you want to "convert XML to ASP pages" (which appears to be your goal). TL; DR：如果要“将XML转换为ASP页面”（这似乎是您的目标），请使用真实的XML解析器。

Answer 3

Try this 尝试这个

Regex.Replace(text, "(=\\n)", "", RegexOptions.Compiled);
Regex.Replace(text, "<tag>(.*?)<tag>", "<%= $1 %>", RegexOptions.Compiled);

如何使正则表达式匹配一个单词以外的所有单词？

问题描述

3 个解决方案

解决方案1
1 2014-10-01 20:45:09

解决方案2
1 2014-10-01 20:58:14

解决方案3
0 2014-10-01 20:46:16

如何使正则表达式匹配一个单词以外的所有单词？

问题描述

3 个解决方案

解决方案1 1 2014-10-01 20:45:09

解决方案2 1 2014-10-01 20:58:14

解决方案3 0 2014-10-01 20:46:16

解决方案1
1 2014-10-01 20:45:09

解决方案2
1 2014-10-01 20:58:14

解决方案3
0 2014-10-01 20:46:16