查找并替换为记事本++

Question

I have a document that was converted from PDF to HTML for use on a company website to be referenced and indexed for search. 我有一个从PDF转换为HTML的文档，可在公司网站上使用，以供参考和索引以进行搜索。 I'm attempting to format the converted document to meet my needs and in doing so I am attempting to clean up some of the junk that was pulled over from when it was a PDF such as page numbers, headers, and footers. 我正在尝试格式化转换后的文档以满足我的需要，在此过程中，我试图清理从PDF时提取的一些垃圾，例如页码，页眉和页脚。 luckily all of these lines that need to be removed are in blocks of 4 lines unfortunately they are not exactly the same therefore cannot be removed with a simple literal replace. 幸运的是，所有这些需要删除的行都是4行的块，不幸的是它们并不完全相同，因此无法通过简单的文字替换将其删除。 The lines contain numbers which are incremental as they correlate with the pages. 这些行包含与页面相关的递增数字。 How can I remove the following example from my html file. 如何从HTML文件中删除以下示例。

Title<br>
10<br>
<hr>
<A name=11></a>Footer<br>

I've tried many different regular expression attempts but as my skill in that area is limited I can't find the proper syntax. 我已经尝试过许多不同的正则表达式尝试，但是由于我在该领域的技能有限，所以找不到合适的语法。 I'm sure i'm missing something fairly easy as it would seem all I need is a wildcard replace for the two numbers in the code and the rest is literal. 我确定我缺少一些相当容易的东西，因为看来我所需要的只是用通配符替换代码中的两个数字，其余的都是文字。

any help is apprciated 任何帮助表示赞赏

Answer 1

The search & replace of npp is quite odd. npp的搜索和替换非常奇怪。 I can't find newline charactes with regular expression, although the documentation says: 我找不到带有正则表达式的换行符，尽管文档说：

As of v4.9 the Simple find/replace (control+h) has changed, allowing the use of \\r \\n and \\t in regex mode and the extended mode. 从v4.9开始，简单查找/替换（control + h）已更改，允许在正则表达式模式和扩展模式下使用\\ r \\ n和\\ t。

I updated to the last version, but it just doesn't work. 我更新到了最新版本，但是它不起作用。 Using the extended mode allows me to find newlines, but I can't specify wildcards. 使用扩展模式可以查找换行符，但无法指定通配符。

However, you can use the macros to overcome this problems. 但是，您可以使用宏来克服此问题。

prepare a search that will find a unique passage (like Title<br>\\r\\n , here you can use the extended mode) 准备搜索以查找唯一段落（例如Title<br>\\r\\n ，在这里您可以使用扩展模式）
start recording a macro 开始录制宏
press F3 to use your search 按F3使用您的搜索
mark the four lines and delete them 标记四行并删除它们
stop recording the macro ... done! 停止录制宏...完成！

Just replay it and it deletes what you wanted to delete. 只需重播它，它就会删除您想要删除的内容。

Answer 2

If I have understood your request correctly this pattern matches your string: 如果我正确理解了您的请求，则此模式与您的字符串匹配：

Title<br>( ?)\n([0-9]+)<br>( ?)\n<hr>( ?)\n<A name=([0-9]+)></a>Footer<br>

I use the Regex Coach to try out complicated regex patterns. 我使用Regex Coach尝试复杂的regex模式。 Other utilities are available. 其他实用程序也可用。

edit 编辑

As I do not use Notepad++ I cannot be sure that this pattern will work for you. 由于我不使用Notepad ++，因此无法确定该模式是否适合您。 Apologies if that transpires to be the case. 很抱歉，如果确实如此。 (I'm a TextPad man myself, and it does work with that tool). （我本人是一个TextPad男人，它确实可以使用该工具）。

查找并替换为记事本++

问题描述

2 个解决方案

解决方案1
1 已采纳 2010-06-11 12:05:17

解决方案2
0 2010-06-11 12:04:34

查找并替换为记事本++

问题描述

2 个解决方案

解决方案1 1 已采纳 2010-06-11 12:05:17

解决方案2 0 2010-06-11 12:04:34

解决方案1
1 已采纳 2010-06-11 12:05:17

解决方案2
0 2010-06-11 12:04:34