[英]How to remove everything except html tag and content of this HTML tag in notepad++?
I open an HTML page in Notepad++. 我在Notepad ++中打开HTML页面。
The html page has a lot of things, but especially this tag: html页面有很多东西,但是特别是这个标记:
<div id="issue_content">CONTENT</div>
I'd like to remove everything from the html file except this tag and its content : 我想从html文件中删除除此标记及其内容以外的所有内容:
<div id="issue_content">CONTENT</div>
Example of file: 文件示例:
<p>ewrfefsd</p>
<div id="issue_content">CONTENT</div>
<p>ewrfefsd</p>
</html>
After deleting, the contents of the file should look like this: 删除后,文件内容应如下所示:
<div id="issue_content">CONTENT</div>
I try to use regular expression: (<div id=\\"issue_content\\">)(.*?)(<\\/div>)(.*?)
我尝试使用正则表达式:
(<div id=\\"issue_content\\">)(.*?)(<\\/div>)(.*?)
, but this regular expression remove only tag <div id="issue_content">CONTENT</div>
and content of this tag ,但此正则表达式仅删除标签
<div id="issue_content">CONTENT</div>
和此标签的内容
You can change your Regex to the following: The idea is that it matches everything, but creates a Match
Group
, containing the string you want, that you can use to replace everything with your Group
: 您可以将Regex更改为以下内容:想法是,它匹配所有内容,但创建一个
Match
Group
,其中包含所需的字符串,可用于将所有内容替换为Group
:
This is the regex: 这是正则表达式:
/[\s\S]*?(<div id=\"issue_content\">[^>]+>)[\s\S]+/
It matches everything at start upto the string, you want, then it creates a Group with your string, and finally matches everything after that. 它在开始时将所有内容匹配到所需的字符串,然后使用您的字符串创建一个Group,最后匹配之后的所有内容。
When replacing, you replace with Group 1: 替换时,将替换为组1:
$1
Now you only have your string. 现在只有字符串了。
Try this, where $str
is your HTML content variable. 试试看,其中
$str
是您的HTML内容变量。
preg_match('/<div id="issue_content">(.*)<\/div>/i', $str, $matches);
echo $matches[1];
This regex should do what you want. 这个正则表达式应该做你想要的。 Make sure you check the
. matches newline
确保您检查了
. matches newline
. matches newline
box on the Replace
tab, and position the cursor at the beginning of the document. . matches newline
“ Replace
选项卡上的. matches newline
框,并将光标定位在文档的开头。
^.*?(<div[^>]*id="issue_content">.*?<\/div>).*$
Replace with \\1
. 替换为
\\1
。
Note that this code will only work if there are no other <div>
tags nested within the one you are looking for. 请注意,只有在您要查找的标签中没有嵌套其他
<div>
标签时,此代码才有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.