Notepad ++删除带有特定文本的标签

Question

I have a large XML file with products inside. 我有一个带有产品的大型XML文件。 I'm trying to delete all products which are out of stock. 我正在尝试删除所有缺货的产品。 File size is over 20MB. 文件大小超过20MB。

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

<product>
  <name>bla2</name>
  <price>60$</price>
  <stock>no</stock>
  <description>bla</description>
</product>

...

Is it possible to delete them using Notepad++'s regex or should I use simpleXML(PHP) or something similar? 是否可以使用Notepad ++的正则表达式删除它们，还是应该使用simpleXML（PHP）或类似的东西？

My basic PHP code: 我的基本PHP代码：

$url = 'input/products.xml';
    $xml = new SimpleXMLElement(file_get_contents($url));

    foreach ($xml->product->children() as $product) {

        //finding out of stock products and deleting them

    }
    $xml->asXml('output/products.xml');

Answer 1

Forward 向前

Doing pattern matching via regular expression is not ideal, if you have access to PHP, then I recommend using a proper HTLM parsing tool. 通过正则表达式进行模式匹配不是理想的选择，如果您可以访问PHP，那么我建议您使用适当的HTLM解析工具。 With that said, I offer a solution you can use in Notepad++ 话虽如此，我提供了可以在Notepad ++中使用的解决方案

Description 描述

<product\\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\\s>]*)*?\\s?\\/?>(?:(?!</product).)*<stock\\s*(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\\s>]*)*?\\s?\\/?>no</stock>(?:(?!</product).)*<\\/product>

Replace with: 用。。。来代替： nothing 没有

正则表达式可视化

To view the image better, you can right click it and select view in new window. 要更好地查看图像，可以右键单击它并选择在新窗口中查看。

This Regular Expression will do the following: 此正则表达式将执行以下操作：

find the entire product section 找到整个产品部分
require the subtag stock 需要子标签stock
require the subtag stock to have a value of no 要求子标签stock的价值为no
avoid extremely edge cases that makes pattern matching in HTML difficult 避免极端情况，以免在HTML中进行模式匹配

From Notepad ++ 从记事本++

From Notepad++, note that you should be using notpad++ version 6.1 or later as there were problems with regular expressions in an older version that have been solved now. 在Notepad ++中，请注意，您应该使用notpad ++版本6.1或更高版本，因为旧版本中的正则表达式存在问题，现已解决。

press the ctrl h to enter the find and replace mode 按ctrl h进入查找和替换模式
Select the Regular Expression option 选择正则表达式选项
In the "Find what" field place the regular expression 在“查找内容”字段中放置正则表达式
in the "Replace with" field enter `` 在“替换为”字段中输入“
Click Replace all 点击全部替换

Example 例

Live Demo 现场演示

https://regex101.com/r/cW9nC5/1 https://regex101.com/r/cW9nC5/1

Sample text 示范文本

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

<product>
  <name>bla2</name>
  <price>60$</price>
  <stock>no</stock>
  <description>bla</description>
</product>

After Replace 更换后

<product>
  <name>bla1</name>
  <price>50$</price>
  <stock>yes</stock>
  <description>bla</description>
</product>

Explanation 说明

NODE                     EXPLANATION
----------------------------------------------------------------------
  <product                 '<product'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ='                       '=\''
----------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    '                        '\''
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ="                       '="'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    [^'"]                    any character except: ''', '"'
----------------------------------------------------------------------
    [^\s>]*                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*?                      end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >                        '>\r\n'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      </product                '</product'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  <stock                   '<stock'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the least amount possible)):
----------------------------------------------------------------------
    [^>=]                    any character except: '>', '='
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ='                       '=\''
----------------------------------------------------------------------
    [^']*                    any character except: ''' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    '                        '\''
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    ="                       '="'
----------------------------------------------------------------------
    [^"]*                    any character except: '"' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
    "                        '"'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    =                        '='
----------------------------------------------------------------------
    [^'"]                    any character except: ''', '"'
----------------------------------------------------------------------
    [^\s>]*                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (0 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )*?                      end of grouping
----------------------------------------------------------------------
  \s?                      whitespace (\n, \r, \t, \f, and " ")
                           (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  \/?                      '/' (optional (matching the most amount
                           possible))
----------------------------------------------------------------------
  >no</stock>              '>no</stock>'
----------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
      </product                '</product'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
    .                        any character except \n
----------------------------------------------------------------------
  )*                       end of grouping
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  \/                       '/'
----------------------------------------------------------------------
  product>                 'product>'
----------------------------------------------------------------------

Answer 2

I guess notepad++ will be easier, ie: 我想notepad ++会更容易，例如：

FIND : <product>\\s+<name>.*?<\\/name>\\s+<price>.*?<\\/price>\\s+<stock>no<\\/stock>\\s+<description>.*?\\/description>\\s+<\\/product> 查找： <product>\\s+<name>.*?<\\/name>\\s+<price>.*?<\\/price>\\s+<stock>no<\\/stock>\\s+<description>.*?\\/description>\\s+<\\/product>
REPLACE : with nothing 替换：一无所有

DEMO DEMO

https://regex101.com/r/fH0mM7/1 https://regex101.com/r/fH0mM7/1

NOTE 注意

Make sure you check Regular Expression at the bottom 确保检查底部的Regular Expression

Answer 3

You can do this with PHP using the below code 您可以使用以下代码使用PHP进行此操作

<?php
    $url = 'input/products.xml';
    $xml = new SimpleXMLElement(file_get_contents($url));
    $i = count($xml) - 1; 
    for ($i; $i >= 0; --$i) {   
       $product = $xml->product[$i];
       if ($product->stock == "no") {
          unset($xml->product[$i]);
       }
    }
    $xml->asXml('output/products.xml');
    ?>

Notepad ++删除带有特定文本的标签

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-05-30 14:45:40

Forward 向前

Description 描述

From Notepad ++ 从记事本++

Example 例

Explanation 说明

解决方案2
1 2016-05-30 14:50:17

解决方案3
1 2016-05-30 14:50:51

Notepad ++删除带有特定文本的标签

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-05-30 14:45:40

Forward 向前

Description 描述

From Notepad ++ 从记事本++

Example 例

Explanation 说明

解决方案2 1 2016-05-30 14:50:17

解决方案3 1 2016-05-30 14:50:51

解决方案1
2 已采纳 2016-05-30 14:45:40

解决方案2
1 2016-05-30 14:50:17

解决方案3
1 2016-05-30 14:50:51