简体   繁体   English

XML删除关闭/打开标记

[英]XML removing closing/opening tags

I have a problem with a piece of XML that I want to parse with PHP. 我有一个XML问题,我想用PHP解析。 Here is the example I have: 这是我的例子:

<tags>
    <content>content</content>
    <amplifications>
        <tag>content 1</tag>
    </amplifications>
    <amplifications>
        <tag>content 2</tag>
        <tag>content 3</tag>
        <tag>content 4</tag>
        <tag>content 5</tag>
    </amplifications>
</tags>

Where I want to remove 我要删除的地方

</amplifications>
<amplifications>

I've tried using preg_replace, but it seems that I cannot figure it out because those tags are indented differently and there are whitespaces. 我已经尝试过使用preg_replace,但似乎我无法弄明白,因为这些标签的缩进方式不同而且有空格。

这应该可以帮到你。

str_replace("</", "<", $XMLData);

The first problem you might encounter is that preg_replace isn't matching between different lines by default. 您可能遇到的第一个问题是默认情况下preg_replace在不同行之间不匹配。

You can add a modifier ( http://php.net/manual/en/reference.pcre.pattern.modifiers.php ) to change this. 您可以添加修饰符( http://php.net/manual/en/reference.pcre.pattern.modifiers.php )来更改此设置。

m (PCRE_MULTILINE) m(PCRE_MULTILINE)

By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). 默认情况下,PCRE将主题字符串视为由单个“行”字符组成(即使它实际上包含多个换行符)。 The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). “行首”元字符(^)仅匹配字符串的开头,而“行尾”元字符($)仅匹配字符串的结尾或终止换行符之前(除非设置了D修饰符) )。 This is the same as Perl. 这与Perl相同。 When this modifier is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. 设置此修饰符时,“行首”和“行尾”构造分别在主题字符串中的任何换行符之后或之前立即匹配,以及在开头和结尾处匹配。 This is equivalent to Perl's /m modifier. 这相当于Perl的/ m修饰符。 If there are no "\\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect. 如果主题字符串中没有“\\ n”字符,或者模式中没有出现^或$,则设置此修饰符无效。

After that, you have to be carefull while writing your regexp. 之后,在编写正则表达式时必须小心。 Things like that could happen: 这样的事情可能会发生:

<amplifications>
    <amplifications>
    </amplifications>
</amplifications>

And you don't want to match the first <amplifications> with the first </amplifications> . 并且您不希望将第一个<amplifications>与第一个</amplifications>匹配。 If this case can't happen, your regexp will be easier to write. 如果不能发生这种情况,您的正则表达式将更容易编写。

I can add details if you want to, but that should already help you a bit. 如果你愿意,我可以添加细节,但这应该对你有所帮助。

Merge all children of all elements with a specific tag-name into the first element: 将具有特定标记名称的所有元素的所有子元素合并到第一个元素中:

Example XML: 示例XML:

<tags>
    <content>content</content>
    <amplifications>
        <tag>content 1</tag>
    </amplifications>
    <amplifications>
        <tag>content 2</tag>
        <tag>content 3</tag>
        <tag>content 4</tag>
        <tag>content 5</tag>
    </amplifications>
</tags>

PHP-Example: PHP-示例:

$doc = new DOMDocument();
$doc->formatOutput = true;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);

$name     = 'amplifications';
$elements = $doc->getElementsByTagName($name);

foreach ($elements as $parent) {
    if ($elements->item(0) === $parent) {
        continue;
    }
    foreach (iterator_to_array($parent->childNodes) as $child) {
        $elements->item(0)->appendChild($child);
    }
    $parent->parentNode->removeChild($parent);
}

echo $doc->saveXML();

Output: 输出:

<?xml version="1.0"?>
<tags>
  <content>content</content>
  <amplifications>
    <tag>content 1</tag>
    <tag>content 2</tag>
    <tag>content 3</tag>
    <tag>content 4</tag>
    <tag>content 5</tag>
  </amplifications>
</tags>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM