[英]Notepad++ RegEx remove between tags when word matched
I had a similiar question that was used for numbers this time I need to use it for keyword. 这次我有一个类似的问题用于数字,因此我需要将其用于关键字。 Below is the sample data that I'm using from a KML file. 以下是我从KML文件中使用的示例数据。 I would like to remove all placemarks that contain the word footway. 我想删除所有包含单词footway的地标。
<Placemark>
<styleUrl>#nothing</styleUrl>
<ExtendedData>
<SchemaData>
<SimpleData>highway</SimpleData>
</SchemaData>
</ExtendedData>
<LineString>
<coordinates>0.0000,0.0000,0</coordinates>
</LineString>
</Placemark>
<Placemark>
<styleUrl>#nothing</styleUrl>
<ExtendedData>
<SchemaData>
<SimpleData>footway</SimpleData>
</SchemaData>
</ExtendedData>
<LineString>
<coordinates>0.0000,0.0000,0</coordinates>
</LineString>
</Placemark>
I tried to use the following however it is capturing everything 我尝试使用以下内容,但它捕获了所有内容
(?i)<Placemark>.*?footway.*?</Placemark>
Below is my notepad++ settings 以下是我的记事本++设置
Find what: (?i)<Placemark>.*?footway.*?</Placemark>
Replace with:
Warp around
Search Mode: Regular expression & mathces newline
Here is a way to go: 这是一种方法:
<Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark>
查找内容: <Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark>
NOTHING
替换为: NOTHING
This will replace all <Placemark>
blocks that contain footway
and only them. 这将替换所有包含footway
<Placemark>
块,并且仅替换它们。
(?!<Placemark)
is a negative lookahead that assumes there're no <Placemark>
before footway
, so, when you have many <Placemark>
's the regex matches a single <Placemark>
at a time. (?!<Placemark)
是一个否定的超前行为 ,它假定footway
前没有<Placemark>
,因此,当您有许多<Placemark>
,正则表达式一次匹配一个<Placemark>
。
(?:(?!<Placemark).)*
is a non capture group, that occurs 0 or more times and does not contain (?!<Placemark)
followed by a character. (?:(?!<Placemark).)*
是一个非捕获组,出现0次或以上,并且不包含(?!<Placemark)
后跟一个字符。
This is working for me with Notepad++ 6.9.2. 这适用于Notepad ++ 6.9.2。 It also works in this online python regex tester: https://regex101.com/r/BYGvzo/1 它也可以在此在线python regex测试器中使用: https : //regex101.com/r/BYGvzo/1
Are you sure you have the correct options ( regular expression
+ . matches newline
) set? 您确定设置了正确的选项( regular expression
+ . matches newline
)吗?
EDIT: Well, after your edit that's a different story! 编辑:好吧,在您编辑后,这是一个不同的故事! Not sure how to achieve it with a regex. 不确定如何使用正则表达式来实现。 I think it would be way easier to parse the XML and then get rid of the nodes containing the word footway. 我认为解析XML然后摆脱包含单词footway的节点会更容易。
See why: RegEx match open tags except XHTML self-contained tags 了解原因: RegEx匹配除XHTML自包含标签之外的其他打开标签
Simplifying your file, it looks like the first line below and your regular expression is matching as per the second line 简化您的文件,它看起来像下面的第一行,并且您的正则表达式按照第二行进行匹配
<Placemark> ... </Placemark> <Placemark> ...footway ... </Placemark>
<Placemark> .*? footway .*? </Placemark>
Need to prevent the first </Placemark>
being included in the match. 需要防止将第一个</Placemark>
包含在比赛中。
If this is a one-off or seldom needed process then an approach I sometime use as it is very adaptable is as follows. 如果这是一次性的或很少需要的过程,那么我有时会使用一种非常适应性强的方法,如下所示。 Find a single character that does not occur anywhere in the file. 查找文件中任何地方都不会出现的单个字符。 For this example =
is used. 对于此示例,使用=
。 Do a replace-all of the regular expression (</?p)(lacemark>)
with \\1=\\2
. 用\\1=\\2
替换所有正则表达式(</?p)(lacemark>)
。 Leading to the text: 导致文字:
<P=lacemark> ... </P=lacemark> <P=lacemark> ...footway ... </P=lacemark>
Then do a replace-all with the regular expression <P=lacemark>[^=]*footway[^=]*</P=lacemark>
with nothing. 然后用正则表达式<P=lacemark>[^=]*footway[^=]*</P=lacemark>
替换所有内容。 Finally, remove all the =
characters with another replace-all. 最后,用另一个替换全部删除所有=
字符。
If there is no easy to use single character (ie something instead of the =
) then precede the above steps with some replacements to create an unused character. 如果没有简单易用的单个字符(即用某些字符代替=
),则在上述步骤之前进行一些替换以创建未使用的字符。 For example first replace all &
with &
例如,首先将所有&
替换为&
then replace all =
with &eq;
然后将所有=
替换为&eq;
. 。 Now the =
is free for use. 现在=
可以免费使用。 After the above steps, undo the replacements, first replace all &eq;
完成上述步骤后,请撤消替换,首先替换所有&eq;
with =
then replace all &
用=
替换所有&
with &
. 与&
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.