简体   繁体   English

如何使用php读取文件中的多个xml内容

[英]how to read the multiple xml contents in a file using php

I'm dealing with this kind of XML sequence file can you any one suggest me to parse this: 我正在处理这种XML序列文件,您是否可以建议我对此进行解析:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>ccccc</name>
<document-id>
<country>US</country>
<doc-number>D0629997</doc-number>
<kind>S1</kind>
<date>20110104</date>
</document-id>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v42-2006-08-23.dtd" [ ]>
<name>dddd</name>
<document-id>
<country>US</country>
<doc-number>D0629998</doc-number>
<kind>S2</kind>
<date>20110104</date>
</document-id>

That's not a valid XML file. 这不是有效的XML文件。 It looks like two files in one, but even then it is invalid. 看起来好像两个文件合而为一,但是即使那样也无效。 Assuming those are two separate files, you could try "tidying" them first. 假设这些是两个单独的文件,则可以先尝试“整理”它们。 Assuming $xml is a string containing the xml contents: 假设$ xml是一个包含xml内容的字符串:

$xml = tidy_repair_string($xml, array(
    'output-xml' => true,
    'input-xml' => true
)); 

Then you could use SimpleXml on it: 然后,您可以在其上使用SimpleXml:

$xml = new SimpleXmlElement($xml);

I know where this XML file has come from and I find it quite strange that Google would provide some invalid XML (unless they are simply just hosting this file that they got from somewhere else). 我知道此XML文件来自何处,我发现Google提供了一些无效的XML(除非他们只是只是托管从其他地方获得的该文件)而感到很奇怪。 This suggestion for parsing it worked for me: How to parse an xml file with multiple xml declaration using PHP? 这项解析建议对我有用如何使用PHP解析具有多个xml声明的xml文件? (A concatenation of several XML files) (几个XML文件的串联)

That file contains a sequence of XML documents concatenated to each other. 该文件包含一系列彼此串联的XML文档。 You need to register a PHP streamwrapper that transparently divides the file for you, then you can process each document individually and even in a streaming fashion. 您需要注册一个PHP streamwrapper,它可以为您透明地分割文件,然后您可以单独甚至以流方式处理每个文档。 Example: 例:

stream_wrapper_register('xmlseq', 'XMLSequenceStream');

$path = "xmlseq://zip://ipg140107.zip#ipg140107.xml";

while (XMLSequenceStream::notAtEndOfSequence($path)) {
    $reader = new XMLReader();
    $reader->open($path);
    // just consume the whole document
    while ($reader::next()) {
        XMLReaderNode::dump($reader);
    }
}

XMLSequenceStream::clean();    

That stream-wrapper is part of the XMLReaderIterator library and works as well with SimpleXMLElement or DOMDocument albeit for larger files XMLReader is a better fit. 该流包装器是XMLReaderIterator库的一部分,并且与SimpleXMLElement或DOMDocument一起使用也可以,尽管较大的文件XMLReader更适合。

For the file I've taken in my example ( http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip from https://www.google.com/googlebooks/uspto-patents-grants-text.html ), the overall element-structure counting elements of the different trees in that sequence for example is: 对于我在示例中获取的文件( https://www.google.com/googlebooks/uspto-patents-grants-text中的 http://storage.googleapis.com/patents/grant_full_text/2014/ipg140107.zip .html ),按该顺序计算不同树的元素的总体元素结构为:

\-us-patent-grant (473)
  |-us-bibliographic-data-grant (473)
  | |-publication-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   |-kind (473)
  | |   \-date (473)
  | |-application-reference (473)
  | | \-document-id (473)
  | |   |-country (473)
  | |   |-doc-number (473)
  | |   \-date (473)
  | |-us-application-series-code (473)
  | |-us-term-of-grant (470)
  | | |-length-of-grant (450)
  | | |-disclaimer (18)
  | | | \-text (18)
  | | \-us-term-extension (20)
  | |-classification-locarno (450)
  | | |-edition (450)
  | | \-main-classification (450)
  | |-classification-national (473)
  | | |-country (473)
  | | |-main-classification (473)
  | | \-further-classification (143)
  | |-invention-title (473)
  | | \-i (12)
  | |-us-references-cited (458)
  | | \-us-citation (11000)
  | |   |-patcit (10265)
  | |   | \-document-id (10265)
  | |   |   |-country (10265)
  | |   |   |-doc-number (10265)
  | |   |   |-kind (9884)
  | |   |   |-name (9811)
  | |   |   \-date (10264)
  | |   |-category (10999)
  | |   |-classification-national (6309)
  | |   | |-country (6309)
  | |   | \-main-classification (6309)
  | |   |-nplcit (735)
  | |   | \-othercit (735)
  | |   |   |-sub (281)
  | |   |   |-i (7)
  | |   |   \-sup (1)
  | |   \-classification-cpc-text (1)
  | |-number-of-claims (472)
  | |-us-exemplary-claim (472)
  | |-us-field-of-classification-search (472)
  | | \-classification-national (8991)
  | |   |-country (8991)
  | |   |-main-classification (8991)
  | |   \-additional-info (1205)
  | |-figures (472)
  | | |-number-of-drawing-sheets (472)
  | | \-number-of-figures (472)
  | |-us-parties (472)
  | | |-us-applicants (472)
  | | | \-us-applicant (765)
  | | |   |-addressbook (765)
  | | |   | |-last-name (573)
  | | |   | |-first-name (573)
  | | |   | |-address (765)
  | | |   | | |-city (765)
  | | |   | | |-country (765)
  | | |   | | \-state (423)
  | | |   | \-orgname (192)
  | | |   \-residence (765)
  | | |     \-country (765)
  | | |-inventors (472)
  | | | \-inventor (969)
  | | |   \-addressbook (969)
  | | |     |-last-name (969)
  | | |     |-first-name (969)
  | | |     \-address (969)
  | | |       |-city (969)
  | | |       |-country (969)
  | | |       \-state (519)
  | | \-agents (429)
  | |   \-agent (500)
  | |     \-addressbook (500)
  | |       |-orgname (361)
  | |       |-address (500)
  | |       | \-country (500)
  | |       |-last-name (139)
  | |       \-first-name (139)
  | |-assignees (385)
  | | \-assignee (391)
  | |   |-addressbook (390)
  | |   | |-orgname (386)
  | |   | |-role (390)
  | |   | |-address (390)
  | |   | | |-city (355)
  | |   | | |-country (390)
  | |   | | \-state (192)
  | |   | |-last-name (4)
  | |   | \-first-name (4)
  | |   |-orgname (1)
  | |   \-role (1)
  | |-examiners (472)
  | | |-primary-examiner (472)
  | | | |-last-name (472)
  | | | |-first-name (472)
  | | | \-department (472)
  | | \-assistant-examiner (65)
  | |   |-last-name (65)
  | |   \-first-name (65)
  | |-us-related-documents (65)
  | | |-continuation-in-part (16)
  | | | \-relation (16)
  | | |   |-parent-doc (16)
  | | |   | |-document-id (16)
  | | |   | | |-country (16)
  | | |   | | |-doc-number (16)
  | | |   | | \-date (16)
  | | |   | |-parent-status (11)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (16)
  | | |     \-document-id (16)
  | | |       |-country (16)
  | | |       \-doc-number (16)
  | | |-continuation (21)
  | | | \-relation (21)
  | | |   |-parent-doc (21)
  | | |   | |-document-id (21)
  | | |   | | |-country (21)
  | | |   | | |-doc-number (21)
  | | |   | | \-date (21)
  | | |   | |-parent-status (16)
  | | |   | \-parent-grant-document (5)
  | | |   |   \-document-id (5)
  | | |   |     |-country (5)
  | | |   |     |-doc-number (5)
  | | |   |     \-date (2)
  | | |   \-child-doc (21)
  | | |     \-document-id (21)
  | | |       |-country (21)
  | | |       \-doc-number (21)
  | | |-division (32)
  | | | \-relation (32)
  | | |   |-parent-doc (32)
  | | |   | |-document-id (32)
  | | |   | | |-country (32)
  | | |   | | |-doc-number (32)
  | | |   | | \-date (32)
  | | |   | |-parent-grant-document (24)
  | | |   | | \-document-id (24)
  | | |   | |   |-country (24)
  | | |   | |   |-doc-number (24)
  | | |   | |   \-date (1)
  | | |   | \-parent-status (8)
  | | |   \-child-doc (32)
  | | |     \-document-id (32)
  | | |       |-country (32)
  | | |       \-doc-number (32)
  | | \-related-publication (9)
  | |   \-document-id (9)
  | |     |-country (9)
  | |     |-doc-number (9)
  | |     |-kind (9)
  | |     \-date (9)
  | |-priority-claims (140)
  | | \-priority-claim (182)
  | |   |-country (182)
  | |   |-doc-number (182)
  | |   \-date (182)
  | |-us-sir-flag (1)
  | |-classifications-ipcr (23)
  | | \-classification-ipcr (24)
  | |   |-ipc-version-indicator (24)
  | |   | \-date (24)
  | |   |-classification-level (24)
  | |   |-section (24)
  | |   |-class (24)
  | |   |-subclass (24)
  | |   |-main-group (24)
  | |   |-subgroup (24)
  | |   |-symbol-position (24)
  | |   |-classification-value (24)
  | |   |-action-date (24)
  | |   | \-date (24)
  | |   |-generating-office (24)
  | |   | \-country (24)
  | |   |-classification-status (24)
  | |   \-classification-data-source (24)
  | |-us-botanic (21)
  | | |-latin-name (21)
  | | \-variety (21)
  | \-classifications-cpc (1)
  |   \-main-cpc (1)
  |     \-classification-cpc (1)
  |       |-cpc-version-indicator (1)
  |       | \-date (1)
  |       |-section (1)
  |       |-class (1)
  |       |-subclass (1)
  |       |-main-group (1)
  |       |-subgroup (1)
  |       |-symbol-position (1)
  |       |-classification-value (1)
  |       |-action-date (1)
  |       | \-date (1)
  |       |-generating-office (1)
  |       | \-country (1)
  |       |-classification-status (1)
  |       |-classification-data-source (1)
  |       \-scheme-origination-code (1)
  |-drawings (472)
  | \-figure (3033)
  |   \-img (3033)
  |-description (472)
  | |-description-of-drawings (472)
  | | |-p (3955)
  | | | |-figref (4478)
  | | | |-b (86)
  | | | \-i (6)
  | | \-heading (22)
  | |-heading (162)
  | \-p (340)
  |   |-figref (15)
  |   |-b (250)
  |   |-i (146)
  |   |-ul (96)
  |   | \-li (444)
  |   |   |-ul (215)
  |   |   | \-li (273)
  |   |   |   |-ul (199)
  |   |   |   | \-li (1192)
  |   |   |   |   |-i (1219)
  |   |   |   |   |-b (1)
  |   |   |   |   |-sup (10)
  |   |   |   |   \-sub (2)
  |   |   |   \-i (11)
  |   |   |-sup (2)
  |   |   \-i (26)
  |   |-tables (15)
  |   | \-table (15)
  |   |   \-tgroup (49)
  |   |     |-colspec (175)
  |   |     |-thead (15)
  |   |     | \-row (27)
  |   |     |   \-entry (51)
  |   |     \-tbody (49)
  |   |       \-row (291)
  |   |         \-entry (997)
  |   |           \-sup (28)
  |   \-sup (2)
  |-us-claim-statement (472)
  |-claims (472)
  | \-claim (476)
  |   \-claim-text (476)
  |     |-figref (1)
  |     |-claim-text (5)
  |     |-claim-ref (4)
  |     \-i (15)
  \-abstract (22)
    \-p (22)
      |-i (27)
      \-ul (2)
        \-li (2)
          \-ul (2)
            \-li (11)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM