使用regex在Notepad ++中选择并替换多行

Question

I have a very large HTML file with the results of a security scan and I need to pull the useless information out of the document. 我有一个非常大的HTML文件，其中包含安全扫描的结果，我需要从文档中提取无用的信息。 An example of what I need to pull out looks something like this: 我需要提取的一个例子看起来像这样：

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

After the edit the text above should just be removed. 在编辑之后，应该删除上面的文本。 I can't do a standard find due to the variation though. 由于变化，我不能做标准查找。 Here is another example of what needs to be removed from the document: 以下是需要从文档中删除的内容的另一个示例：

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

I need to treat the ID number, 10395, as a variable, but the length stays the same. 我需要将ID号10395视为变量，但长度保持不变。 Also, "Microsoft Windows SMB Shares Enumeration" needs to be treated as a variable too, since it changes throughout the document. 此外，“Microsoft Windows SMB共享枚举”也需要被视为变量，因为它在整个文档中都会发生变化。

I have tried throwing something like this into replace, but I think I am totally missing the mark. 我已经尝试过这样的东西来代替，但我想我完全错过了这个标记。

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>

Maybe I should be using a different tool altogether? 也许我应该完全使用不同的工具？

Answer 1

Regex in order from least sophisticated to more sophisticated, but all of them get the job done: 正则表达式从最复杂到更复杂，但所有这些都完成了工作：

<a.*>.*\d.*</a>

<a.*>.*\d{5}.*</a>

<a.*id=\d{5}.*>.*\d{5}.*</a>

Disclaimer: be careful . 免责声明： 小心。 I can't parse html with regex. 我不能用正则表达式解析html。

Answer 2

I assume by repeating \\1 multiple times you mean a placeholder for a single character but that's not right. 我假设多次重复\\1表示单个字符的占位符，但这不正确。 What you are trying to achieve is something like this: 你想要实现的是这样的：

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(\d+)" target="_blank"> \1</a>

To match whole 6 lines: 要匹配整个6行：

<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&amp;id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>

Then you can replace it with an empty string. 然后你可以用空字符串替换它。

使用regex在Notepad ++中选择并替换多行

问题描述

2 个解决方案

解决方案1
1 2017-06-16 17:17:54

解决方案2
1 已采纳 2017-06-16 17:22:43

使用regex在Notepad ++中选择并替换多行

问题描述

2 个解决方案

解决方案1 1 2017-06-16 17:17:54

解决方案2 1 已采纳 2017-06-16 17:22:43

解决方案1
1 2017-06-16 17:17:54

解决方案2
1 已采纳 2017-06-16 17:22:43