简体   繁体   English

使用regex在Notepad ++中选择并替换多行

[英]Select and replace multiple lines in Notepad++ using regex

I have a very large HTML file with the results of a security scan and I need to pull the useless information out of the document. 我有一个非常大的HTML文件,其中包含安全扫描的结果,我需要从文档中提取无用的信息。 An example of what I need to pull out looks something like this: 我需要提取的一个例子看起来像这样:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=10395" target="_blank"> 10395</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Microsoft Windows SMB Shares Enumeration</span></td>
</tr>

After the edit the text above should just be removed. 在编辑之后,应该删除上面的文本。 I can't do a standard find due to the variation though. 由于变化,我不能做标准查找。 Here is another example of what needs to be removed from the document: 以下是需要从文档中删除的内容的另一个示例:

<tr>
<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>
<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=11219" target="_blank"> 11219</a>
</td>
<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">Nessus SYN scanner</span></td>
</tr>

I need to treat the ID number, 10395, as a variable, but the length stays the same. 我需要将ID号10395视为变量,但长度保持不变。 Also, "Microsoft Windows SMB Shares Enumeration" needs to be treated as a variable too, since it changes throughout the document. 此外,“Microsoft Windows SMB共享枚举”也需要被视为变量,因为它在整个文档中都会发生变化。

I have tried throwing something like this into replace, but I think I am totally missing the mark. 我已经尝试过这样的东西来代替,但我想我完全错过了这个标记。

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=\1\1\1\1\1" target="_blank"> \1\1\1\1\1</a>

Maybe I should be using a different tool altogether? 也许我应该完全使用不同的工具?

Regex in order from least sophisticated to more sophisticated, but all of them get the job done: 正则表达式从最复杂到更复杂,但所有这些都完成了工作:

<a.*>.*\d.*</a>

<a.*>.*\d{5}.*</a>

<a.*id=\d{5}.*>.*\d{5}.*</a>

Disclaimer: be careful . 免责声明: 小心 I can't parse html with regex. 我不能用正则表达式解析html。

I assume by repeating \\1 multiple times you mean a placeholder for a single character but that's not right. 我假设多次重复\\1表示单个字符的占位符,但这不正确。 What you are trying to achieve is something like this: 你想要实现的是这样的:

<td width="10%" valign="top" class="classcell"> <a href="http://www.nessus.org/plugins/index.php?view=single&amp;id=(\d+)" target="_blank"> \1</a>

To match whole 6 lines: 要匹配整个6行:

<tr>\s*<td width="20%" valign="top" class="classcell0"><span class="classtext" style="color: #ffffff; font-weight: bold !important;">Info</span></td>\s*<td width="10%" valign="top" class="classcell"> <a href="http://www\.nessus\.org/plugins/index\.php\?view=single&amp;id=(\d+)" target="_blank"> \1</a>\s*</td>\s*<td width="70%" valign="top" class="classcell"><span class="classtext" style="color: #263645; font-weight: normal;">.*?</span></td>\s*</tr>

Then you can replace it with an empty string. 然后你可以用空字符串替换它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM