简体   繁体   English

PHP:如何使用SimpleXML解析/获取XML outsideText

[英]PHP: How to parse/get XML outerText with SimpleXML

I'm trying to parse a large XML file to put the contents in my database. 我正在尝试解析一个大型XML文件,以将内容放入数据库中。 My question is simple, although I find it difficult to find a nice and clean solution. 我的问题很简单,尽管我发现很难找到一个好的干净的解决方案。

Imagine the following XML-string: 想象以下XML字符串:

<tag1>
    OuterText <tag2>InnerText</tag2>
</tag1>

Edit. 编辑。 The question is: How do I catch the OuterText in a string? 问题是:如何捕获字符串中的OuterText?

I could just remove tags of and the tags and content of using regex, but so far I've been using SimpleXML so I'd prefer an answer that goes nicely with this practice. 我可以只删除regex的标记以及使用正则表达式的标记和内容,但是到目前为止,我一直在使用SimpleXML,所以我更喜欢一个与该实践很好地结合的答案。

Okay, looks like I asked this question too fast. 好的,看来我问得太快了。 I messed around a bit using my own simplified example and this is what I found. 我使用自己的简化示例弄乱了一点,这就是我发现的结果。 It actually works, despite the malformed XML. 尽管XML格式不正确,但它实际上可以工作。

$xml = "<tag1>
          OuterText <tag2>InnerText</tag2>
        </tag1>"

$sxe = new SimpleXMLElement($xml);

$out = (string)$sxe;
$in = (string)$sxe->tag2;

// output:
// OuterText
// InnerText
echo "$out<br>$in";

Edit: This method will produce the following result with an XML-string with OuterText on both sides of the inline tag: 编辑:此方法将产生以下结果,并且内联标签的两侧都带有带有OuterText的XML字符串:

$xml = "<tag1>
          OuterText1 <tag2>InnerText</tag2> OuterText2
        </tag1>"
// output will then be:
// OuterText1 OuterText2 ($out)
// InnerText ($in)

Something like this should work: 这样的事情应该起作用:

$yourinput = new SimpleXMLElement($xmlstr);
foreach($yourinput->tag1 as $curtag){
    mysql_query("INSERT INTO table (field1, field2) VALUES($curtag, $curtag->tag2)");
}

If I understand the question correctly, you want all the text content of a tag, in order, but without any inner XML tags. 如果我正确理解了这个问题,则您希望按顺序获得标签的所有文本内容,但没有任何内部XML标签。

It's not particularly elegant, but this would theoretically do the trick: 它不是特别优雅,但是从理论上讲可以做到这一点:

$inner_text = strip_tags($some_simplexml_node->asXML()); 

The trick here is that SimpleXML can serialize any fragment of XML (eg a single node that you've found while traversing the document) back into XML; 这里的窍门是,SimpleXML可以将XML的任何片段(例如,遍历文档时发现的单个节点)序列化回XML。 removing all tags from that should then give you all the text content, in the right order. 从中删除所有标签后,应该会以正确的顺序为您提供所有文本内容。

You wont be able to use simpleXML or anything similar for this as it is not valid XML to have this text contained outside of any element. 您将不能使用simpleXML或类似的东西,因为将文本包含在任何元素之外不是有效的XML。 Is this intentional or an error in the XML generation(not sure where you are getting the XML from)? 这是故意的还是XML生成错误(不确定从何处获取XML)?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM