简体   繁体   English

单词匹配时,Notepad ++ RegEx会在标签之间删除

[英]Notepad++ RegEx remove between tags when word matched

I had a similiar question that was used for numbers this time I need to use it for keyword. 这次我有一个类似的问题用于数字,因此我需要将其用于关键字。 Below is the sample data that I'm using from a KML file. 以下是我从KML文件中使用的示例数据。 I would like to remove all placemarks that contain the word footway. 我想删除所有包含单词footway的地标。

 <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>highway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>     
    <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>footway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>

I tried to use the following however it is capturing everything 我尝试使用以下内容,但它捕获了所有内容

(?i)<Placemark>.*?footway.*?</Placemark>

Below is my notepad++ settings 以下是我的记事本++设置

Find what: (?i)<Placemark>.*?footway.*?</Placemark>
Replace with:
Warp around
Search Mode: Regular expression & mathces newline

Here is a way to go: 这是一种方法:

  • Find what: <Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark> 查找内容: <Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark>
  • Replace with: NOTHING 替换为: NOTHING

This will replace all <Placemark> blocks that contain footway and only them. 这将替换所有包含footway <Placemark>块,并且仅替换它们。

(?!<Placemark) is a negative lookahead that assumes there're no <Placemark> before footway , so, when you have many <Placemark> 's the regex matches a single <Placemark> at a time. (?!<Placemark)是一个否定的超前行为 ,它假定footway前没有<Placemark> ,因此,当您有许多<Placemark> ,正则表达式一次匹配一个<Placemark>

(?:(?!<Placemark).)* is a non capture group, that occurs 0 or more times and does not contain (?!<Placemark) followed by a character. (?:(?!<Placemark).)*是一个非捕获组,出现0次或以上,并且不包含(?!<Placemark)后跟一个字符。

This is working for me with Notepad++ 6.9.2. 这适用于Notepad ++ 6.9.2。 It also works in this online python regex tester: https://regex101.com/r/BYGvzo/1 它也可以在此在线python regex测试器中使用: https : //regex101.com/r/BYGvzo/1

Are you sure you have the correct options ( regular expression + . matches newline ) set? 您确定设置了正确的选项( regular expression + . matches newline )吗?

EDIT: Well, after your edit that's a different story! 编辑:好吧,在您编辑后,这是一个不同的故事! Not sure how to achieve it with a regex. 不确定如何使用正则表达式来实现。 I think it would be way easier to parse the XML and then get rid of the nodes containing the word footway. 我认为解析XML然后摆脱包含单词footway的节点会更容易。

See why: RegEx match open tags except XHTML self-contained tags 了解原因: RegEx匹配除XHTML自包含标签之外的其他打开标签

Simplifying your file, it looks like the first line below and your regular expression is matching as per the second line 简化您的文件,它看起来像下面的第一行,并且您的正则表达式按照第二行进行匹配

<Placemark> ... </Placemark> <Placemark> ...footway ... </Placemark>
<Placemark>    .*?                          footway .*? </Placemark>

Need to prevent the first </Placemark> being included in the match. 需要防止将第一个</Placemark>包含在比赛中。

If this is a one-off or seldom needed process then an approach I sometime use as it is very adaptable is as follows. 如果这是一次性的或很少需要的过程,那么我有时会使用一种非常适应性强的方法,如下所示。 Find a single character that does not occur anywhere in the file. 查找文件中任何地方都不会出现的单个字符。 For this example = is used. 对于此示例,使用= Do a replace-all of the regular expression (</?p)(lacemark>) with \\1=\\2 . \\1=\\2替换所有正则表达式(</?p)(lacemark>) Leading to the text: 导致文字:

<P=lacemark> ... </P=lacemark> <P=lacemark> ...footway ... </P=lacemark>

Then do a replace-all with the regular expression <P=lacemark>[^=]*footway[^=]*</P=lacemark> with nothing. 然后用正则表达式<P=lacemark>[^=]*footway[^=]*</P=lacemark>替换所有内容。 Finally, remove all the = characters with another replace-all. 最后,用另一个替换全部删除所有=字符。

If there is no easy to use single character (ie something instead of the = ) then precede the above steps with some replacements to create an unused character. 如果没有简单易用的单个字符(即用某些字符代替= ),则在上述步骤之前进行一些替换以创建未使用的字符。 For example first replace all & with &amp; 例如,首先将所有&替换为&amp; then replace all = with &eq; 然后将所有=替换为&eq; . Now the = is free for use. 现在=可以免费使用。 After the above steps, undo the replacements, first replace all &eq; 完成上述步骤后,请撤消替换,首先替换所有&eq; with = then replace all &amp; =替换所有&amp; with & . &

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM