简体   繁体   English

使用simpleXML进行XML解析

[英]XML parsing using simpleXML

I am trying to parse the XML found on the page ... 我正在尝试解析页面上找到的XML ...

http://www.rapleaf.com/apidoc/person http://www.rapleaf.com/apidoc/person

Name: Test Dummy
Age: 42
gender: Male
Address: San Francisco, CA, US
Occupation:
University: Berkeley
first seen: 2006-02-23
last seen: 2008-09-25
Friends: 42
Name:
Age:
gender:
Address:
Occupation:
University:
first seen:
last seen:
Friends: 

1) I had to remove the records where "&" was found. 1)我必须删除找到“&”的记录。 I could process the page only after that. 之后,我才能处理页面。

2) I could not parse the "membership site" nor could I parse "occupation" 2)我无法解析“会员站点”,也无法解析“职业”

3) I am getting 2 records when I am expecting only one. 3)我只希望获得2条记录。

4) How do I insert these records in the Database? 4)如何将这些记录插入数据库?

<?php

// displays all the file nodes
if(!$xml=simplexml_load_file('rapleaf.xml')){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

foreach($xml as $user){
    echo 'Name: '.$user->name. '
<br /> Age: '.$user->age.'
<br /> gender: '.$user->gender.'
<br /> Address: '.$user->location.'
<br /> Occupation: '.$user->occupations->occupation->company.'
<br /> University: '.$user->universities->university.'
<br /> first seen: '.$user->earliest_known_activity.'
<br /> last seen: '.$user->latest_known_activity.'
<br /> Friends: '.$user->num_friends.'
<br />';
}

?>

To be able to parse that document (which is not well formed) I would recommend to do the following: 为了能够解析该文档(格式不正确),我建议执行以下操作:

$xmlString = file_get_contents('rapleaf.xml');
$xmlString = str_replace('&', '&amp;', $xmlString);

if(!$xml=simplexml_load_string($xmlString)){
    trigger_error('Error reading XML file',E_USER_ERROR);
}

First read the file into a string, that replace the ampersand characters (within the link) with their entity. 首先,将文件读取为字符串,并用其实体替换“&”字符(在链接内)。 That you can use the simplexml_load_file() function to create the xml object. 您可以使用simplexml_load_file()函数创建xml对象。

Now you are able to parse the document. 现在您可以解析文档了。 As far as I can see, there is only one person in each file. 据我所知,每个文件中只有一个人。 So you don't need a foreach loop. 因此,您不需要foreach循环。 But you can parse all field, you just have to know how. 但是您可以解析所有字段,只需知道如何操作即可。 Here is some more complex exmaple parsing different things with different methods: 这是一些更复杂的示例,使用不同的方法解析不同的内容:

echo '    Name: '.(string)$xml->basics->name. '
        <br /> Age: '.(string)$xml->basics->age.'
        <br /> gender: '.(string)$xml->basics->gender.'
        <br /> Address: '.(string)$xml->basics->location;
// There might be more than one occupation
foreach($xml->occupations as $occupation){
    echo '<br /> Occupation: '.$occupation->attributes()->title;
    if(isset($occupation->attributes()->company)){
        echo '; at company: '.$occupation->attributes()->company;
    }
}
// There might be more than one university
foreach($xml->universities as $university){
    echo '<br /> University: '.$university;
}
echo    '<br /> first seen: '.(string)$xml->basics->earliest_known_activity.'
        <br /> last seen: '.(string)$xml->basics->latest_known_activity.'
        <br /> Friends: '.(string)$xml->basics->num_friends;
// getting all the primary membership pages
foreach($xml->memberships->primary->membership as $membership){
    if($membership->attributes()->exists == "true"){
        echo '<br />'.$membership->attributes()->site;
        if(isset($membership->attributes()->profile_url)){
            echo ' | '.$membership->attributes()->profile_url;
        }
        if(isset($membership->attributes()->num_friends)){
            echo ' | '.$membership->attributes()->num_friends;
        }
    }
}

For Text that is included in a tag, you have to cast it to string: 对于标签中包含的文本,必须将其强制转换为字符串:

echo 'Name: '.(string)$xml->basics->name;

To get the value of an attribute of a tag, use the attributes() function. 要获取标签属性的值,请使用attributes()函数。 You don't have to cast it this time: 您这次不必强制转换:

echo 'Occupation: '.$xml->occupations->occupation[0]->attributes()->title;

As you can see, you can also get a specific child node, as all the child nodes are stored in an array. 如您所见,您还可以获得一个特定的子节点,因为所有的子节点都存储在一个数组中。 Just use the index. 只需使用索引。 If you just want one child node, you don't have to use a loop for that. 如果只需要一个子节点,则不必为此使用循环。

But you always have to make sure that the element you are using the attirbutes() function on is valid as otherwise an error will be thrown. 但是您必须始终确保使用attirbutes()函数的元素有效,否则将引发错误。 So so may want to test that via isset() to be sure. 因此,可能要通过isset()进行测试以确保确定。

I hop you now have an idea on how to parse some XML using SimpleXML. 我希望您现在对如何使用SimpleXML解析一些XML有一个想法。 If you have any additional questions, just ask again or even in a new question. 如果您还有其他问题,只需再问一次,甚至再问一个新问题。

1 . 1。 Ampersands are part of the XML syntax specification (they are used to encode non-standard characters). “&”号是XML语法规范的一部分(用于编码非标准字符)。 Therefore, they cannot be used alone in XML documents. 因此,它们不能在XML文档中单独使用。 They have to be encoded into & or they have to be enclosed in a CDATA-block : http://www.w3schools.com/xmL/xml_cdata.asp . 它们必须编码为&,或者必须包含在CDATA块中: http : //www.w3schools.com/xmL/xml_cdata.asp

2 . 2。 You cannot access children elements like that ($user->occupations->occupation), because the element has children. 您不能访问这样的子元素($ user-> occupations-> occupation),因为该元素具有子元素。 You will have to do something like: 您将必须执行以下操作:

$a = $user->occupations->children();
$b = $b->occupation->attributes();
$c = (string)$b->company;

Check out http://php.net/manual/de/book.simplexml.php for more information. 查看http://php.net/manual/de/book.simplexml.php了解更多信息。

3 . 3。 You are getting two records, because XML elements always have a root element which encloses their children. 您将获得两条记录,因为XML元素始终具有一个根元素,该根元素将其子元素括起来。 Therefore, when you iterate which foreach over $xml, you first get a SimpleXMLElement object for , and then for . 因此,当您遍历$ xml上的foreach时,首先会获得一个SimpleXMLElement对象,然后是。 is used as root element. 用作根元素。

4 . 4。 This really is another question, and dependant on which database you want to use. 这确实是另一个问题,并且取决于要使用的数据库。 Google will help you on that. Google会帮助您。 You'll probably want to use MySQL, because you are working with php. 您可能要使用MySQL,因为您正在使用php。 So check out http://www.google.de/search?sourceid=chrome&ie=UTF-8&q=php+mysql+tutorial :) 因此,请查看http://www.google.de/search?sourceid=chrome&ie=UTF-8&q=php+mysql+tutorial :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM