简体   繁体   English

查找并替换为记事本++

[英]Find and Replace with Notepad++

I have a document that was converted from PDF to HTML for use on a company website to be referenced and indexed for search. 我有一个从PDF转换为HTML的文档,可在公司网站上使用,以供参考和索引以进行搜索。 I'm attempting to format the converted document to meet my needs and in doing so I am attempting to clean up some of the junk that was pulled over from when it was a PDF such as page numbers, headers, and footers. 我正在尝试格式化转换后的文档以满足我的需要,在此过程中,我试图清理从PDF时提取的一些垃圾,例如页码,页眉和页脚。 luckily all of these lines that need to be removed are in blocks of 4 lines unfortunately they are not exactly the same therefore cannot be removed with a simple literal replace. 幸运的是,所有这些需要删除的行都是4行的块,不幸的是它们并不完全相同,因此无法通过简单的文字替换将其删除。 The lines contain numbers which are incremental as they correlate with the pages. 这些行包含与页面相关的递增数字。 How can I remove the following example from my html file. 如何从HTML文件中删除以下示例。

Title<br>
10<br>
<hr>
<A name=11></a>Footer<br>

I've tried many different regular expression attempts but as my skill in that area is limited I can't find the proper syntax. 我已经尝试过许多不同的正则表达式尝试,但是由于我在该领域的技能有限,所以找不到合适的语法。 I'm sure i'm missing something fairly easy as it would seem all I need is a wildcard replace for the two numbers in the code and the rest is literal. 我确定我缺少一些相当容易的东西,因为看来我所需要的只是用通配符替换代码中的两个数字,其余的都是文字。

any help is apprciated 任何帮助表示赞赏

The search & replace of npp is quite odd. npp的搜索和替换非常奇怪。 I can't find newline charactes with regular expression, although the documentation says: 我找不到带有正则表达式的换行符,尽管文档说:

As of v4.9 the Simple find/replace (control+h) has changed, allowing the use of \\r \\n and \\t in regex mode and the extended mode. 从v4.9开始,简单查找/替换(control + h)已更改,允许在正则表达式模式和扩展模式下使用\\ r \\ n和\\ t。

I updated to the last version, but it just doesn't work. 我更新到了最新版本,但是它不起作用。 Using the extended mode allows me to find newlines, but I can't specify wildcards. 使用扩展模式可以查找换行符,但无法指定通配符。

However, you can use the macros to overcome this problems. 但是,您可以使用来克服此问题。

  • prepare a search that will find a unique passage (like Title<br>\\r\\n , here you can use the extended mode) 准备搜索以查找唯一段落(例如Title<br>\\r\\n ,在这里您可以使用扩展模式)
  • start recording a macro 开始录制宏
  • press F3 to use your search 按F3使用您的搜索
  • mark the four lines and delete them 标记四行并删除它们
  • stop recording the macro ... done! 停止录制宏...完成!

Just replay it and it deletes what you wanted to delete. 只需重播它,它就会删除您想要删除的内容。

If I have understood your request correctly this pattern matches your string: 如果我正确理解了您的请求,则此模式与您的字符串匹配:

Title<br>( ?)\n([0-9]+)<br>( ?)\n<hr>( ?)\n<A name=([0-9]+)></a>Footer<br>

I use the Regex Coach to try out complicated regex patterns. 我使用Regex Coach尝试复杂的regex模式。 Other utilities are available. 其他实用程序也可用。

edit 编辑

As I do not use Notepad++ I cannot be sure that this pattern will work for you. 由于我不使用Notepad ++,因此无法确定该模式是否适合您。 Apologies if that transpires to be the case. 很抱歉,如果确实如此。 (I'm a TextPad man myself, and it does work with that tool). (我本人是一个TextPad男人,它确实可以使用该工具)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM