简体   繁体   English

如何在记事本++中删除除html标签和此HTML标签的内容以外的所有内容?

[英]How to remove everything except html tag and content of this HTML tag in notepad++?

I open an HTML page in Notepad++. 我在Notepad ++中打开HTML页面。

The html page has a lot of things, but especially this tag: html页面有很多东西,但是特别是这个标记:

<div id="issue_content">CONTENT</div>

I'd like to remove everything from the html file except this tag and its content : 我想从html文件中删除除此标记及其内容以外的所有内容:

<div id="issue_content">CONTENT</div>

Example of file: 文件示例:

<p>ewrfefsd</p>
<div id="issue_content">CONTENT</div>
<p>ewrfefsd</p>
</html>

After deleting, the contents of the file should look like this: 删除后,文件内容应如下所示:

<div id="issue_content">CONTENT</div>

I try to use regular expression: (<div id=\\"issue_content\\">)(.*?)(<\\/div>)(.*?) 我尝试使用正则表达式: (<div id=\\"issue_content\\">)(.*?)(<\\/div>)(.*?)
, but this regular expression remove only tag <div id="issue_content">CONTENT</div> and content of this tag ,但此正则表达式仅删除标签<div id="issue_content">CONTENT</div>和此标签的内容

You can change your Regex to the following: The idea is that it matches everything, but creates a Match Group , containing the string you want, that you can use to replace everything with your Group : 您可以将Regex更改为以下内容:想法是,它匹配所有内容,但创建一个Match Group ,其中包含所需的字符串,可用于将所有内容替换为Group

This is the regex: 这是正则表达式:

/[\s\S]*?(<div id=\"issue_content\">[^>]+>)[\s\S]+/

It matches everything at start upto the string, you want, then it creates a Group with your string, and finally matches everything after that. 它在开始时将所有内容匹配到所需的字符串,然后使用您的字符串创建一个Group,最后匹配之后的所有内容。

When replacing, you replace with Group 1: 替换时,将替换为组1:

$1

Now you only have your string. 现在只有字符串了。

Try this, where $str is your HTML content variable. 试试看,其中$str是您的HTML内容变量。

preg_match('/<div id="issue_content">(.*)<\/div>/i', $str, $matches);

echo $matches[1];

This regex should do what you want. 这个正则表达式应该做你想要的。 Make sure you check the . matches newline 确保您检查了. matches newline . matches newline box on the Replace tab, and position the cursor at the beginning of the document. . matches newlineReplace选项卡上的. matches newline框,并将光标定位在文档的开头。

^.*?(<div[^>]*id="issue_content">.*?<\/div>).*$

Replace with \\1 . 替换为\\1

Note that this code will only work if there are no other <div> tags nested within the one you are looking for. 请注意,只有在您要查找的标签中没有嵌套其他<div>标签时,此代码才有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM