简体   繁体   English

使用SimpleXML将多个XML文件转换为一个CSV

[英]Convert multiple XML files to one CSV with SimpleXML

I have some xml files, which have the same elements but only with different information. 我有一些xml文件,它们具有相同的元素,但仅具有不同的信息。

First file test.xml 第一个文件test.xml

<?xml version="1.0" encoding="UTF-8"?>
<phones>
    <phone>
        <title>"Apple iPhone 5S"</title>
        <price>
            <regularprice>500</regularprice>
            <saleprice>480</saleprice>
        </price> 
        <color>black</color>
    </phone>
</phones>

Second file test1.xml 第二个文件test1.xml

<?xml version="1.0" encoding="UTF-8"?>
<phones>
    <phone>
        <title>Nokia Lumia 830</title>
        <price>
            <regularprice>400</regularprice>
            <saleprice>370</saleprice>
        </price> 
        <color>black</color>
    </phone>
</phones>

I need to convert some values from these xml files into 1 test.csv file 我需要将这些xml文件中的一些值转换为1个test.csv文件

So I am using this php code 所以我正在使用这个php代码

<?php

$filexml1='test.xml';
$filexml2='test1.xml';

    //File 1
    if (file_exists($filexml1)) {
        $xml = simplexml_load_file($filexml1); 
        $f = fopen('test.csv', 'w');

    $headers = array('title', 'color');
    $converted_array = array_map("strtoupper", $headers);


    fputcsv($f, $converted_array, ',', '"');


    foreach ($xml->phone as $phone) {

        //$phone->title = trim($phone->title, " ");
        // Array of just the components you need...
        $values = array(
           "title" => (string)$phone->title = trim(str_replace ( "\"", "&quot;", $phone->title ), " "), 
           "color" => (string)$phone->color
        );
        fputcsv($f, $values,',','"');

    }
    fclose($f); 

    echo "<p>File 1 coverted to .csv sucessfully</p>";
} else {
    exit('Failed to open test.xml.');
}

    //File 2
    if (file_exists($filexml2)) {
        $xml = simplexml_load_file($filexml2); 
        $f = fopen('test.csv', 'a');


    //the same code for second file like for the first file

    echo "<p>File 2 coverted to .csv sucessfully</p>";
} else {
    exit('Failed to open test1.xml.');
}

?>

The output of the test.csv looks this way test.csv的输出看起来是这样的

TITLE             COLOR
Apple iPhone 5S   black
Nokia Lumia 830   black

As you can see I only managed to load each file into a variable and for each file I have to write if statement which makes the script too big, so I am wondering if it is possible to load all files into array, process them with one code block because xml elements are the same and output to one .csv file? 如您所见,我仅设法将每个文件加载到一个变量中,并且必须为每个文件编写if语句,这会使脚本太大,因此我想知道是否有可能将所有文件加载到数组中,并用一个文件进行处理。代码块,因为xml元素相同并且输出到一个.csv文件? Essentially I need the same test.csv output only with less php code. 本质上,我只需要更少的php代码就可以得到相同的test.csv输出。

Thanks in advance. 提前致谢。

Next to using an array, there is more in PHP which can make it even more simple. 除了使用数组之外,PHP中还有更多功能可以使它更加简单。 Like an array could represent a list of your files, other constructs in PHP can that, too. 就像数组可以代表文件列表一样,PHP中的其他构造也可以。

For example, as the XML files you have most likely are inside a specific directory and follow some pattern with their filename, those could be easily represented with a GlobIterator : 例如,由于您最有可能的XML文件位于特定目录中, 按照其文件名遵循某种模式,因此可以使用GlobIterator轻松表示这些文件:

$inputFiles = new GlobIterator(__DIR__ . '/*.xml');

You could then foreach over them which I'll show in a moment with another example. 然后,您可以对它们进行foreach ,稍后我将在另一个示例中进行展示。

Such a list allows you to streamline your processing. 这样的列表使您可以简化处理。 That is important because there is some kind of a generic formular for many programs: Input, Process, Output. 这很重要,因为许多程序都有某种通用公式化器:输入,过程,输出。 This is also called IPO or IPO+S Model. 这也称为IPO或IPO + S模型。 The S stands for storing. S代表存储。 In your case while you process the input data, you also store into a new file CSV file which is also the output (after processing is fully done). 在处理输入数据的情况下,还将存储到一个新的CSV文件中,该文件也是输出(在完全完成处理之后)。

When you follow such a generic model, it's easier to structure your code and with a better structure you most often have less code. 当您遵循这样的通用模型时,结构化代码会更容易,而结构更好的情况下,通常您的代码会更少。 Even if not, each part of your code is more self-contained and smaller which is most often what you're looking for. 即使不是,代码的每个部分也更加独立并且更小,这正是您所需要的。

Next to the said list of XML-files I showed at the beginning of the answer with the GlobIterator there are other Iterators that can help to process the XML data. 在答案开头用GlobIterator显示的XML文件列表旁边,还有其他Iterators可以帮助处理XML数据。

For example, you've got 1-n XML files that contain 0-n <phone> elements. 例如,您有1-n个XML文件,其中包含0-n个<phone>元素。 You know that you want to process any of these <phone> elements, you already exactly know what you want to do with them (extract some data from it). 您知道要处理这些<phone>元素中的任何一个,您已经完全知道要使用它们做什么(从中提取一些数据)。 So wouldn't it be great to have a list of all <phone> elements within all XML-files first? 那么首先列出所有XML文件中的所有<phone>元素不是很好吗?

This can be easily done in PHP with the help of a Generator . 这可以在PHP中借助Generator轻松完成。 That is a function that can return values multiple times while it's still "running". 该函数可以在其“正在运行”时多次返回值。 This is a simplification, better show some code to illustrate that. 这是一个简化,更好地显示一些代码来说明这一点。 Let's say we've got the list of XML files as input and we want all <phone> elements out of it. 假设我们已经将XML文件列表作为输入,并且我们希望其中的所有<phone>元素都没有。 For sure, you could create an array of all these <phone> elements and process that array later. 当然,您可以创建所有这些<phone>元素的数组,并在以后处理该数组。 However, a Generator is able to offer all these <phone> elements directly to be used within a foreach loop: 但是, 生成器能够直接提供所有这些<phone>元素,以便在foreach循环中使用:

function extract_phones(Traversable $files) {
    foreach ($files as $file) {
        $xml = simplexml_load_file($file);
        if ($xml === false) {
            continue;
        }
        foreach ($xml->phone as $phone) {
            yield $phone;
        }
    }
}

As this exemplary Generator function shows, it goes over all $files , tries to load them as a SimpleXMLElement and if successfull, iterates over all <phone> elements and yields them. 如此示例性的Generator函数所示,它遍历所有$files ,尝试将它们加载为SimpleXMLElement ,如果成功,则遍历所有<phone>元素并产生它们。

That means, if the function extract_phones is called within a foreach , that loop will have every <phone> element as SimpleXMLElement : 这意味着,如果在foreach调用函数extract_phones ,则该循环会将每个<phone>元素都作为SimpleXMLElement

foreach(extract_phones($inputFiles) as $phone) {
    # $phone is a SimpleXMLElement here
}

So now your question asks about creating the CSV file as output. 因此,现在您的问题询问有关创建CSV文件作为输出的问题。 This could be done creating an SplFileObject to pass the output around and access it while processing. 这可以通过创建SplFileObject来完成,以传递输出并在处理时对其进行访问。 It basically works the same like passing the file-handle around like you do in your question but it has better semantics that do allow to change the code more easily later on (you could replace it with another object that behaves the same). 它的工作原理基本上就像您在问题中那样传递文件句柄一样, 但是它具有更好的语义,确实可以在以后更轻松地更改代码(您可以用行为相同的另一个对象替换它)。

Additionally I've seen a little detail in your code that is worth for some discussion first. 另外,我在您的代码中看到了一些细节,值得首先进行一些讨论。 You're encoding the quotes as HTML entities: 您正在将引号编码为HTML实体:

 trim(str_replace( "\"", "&quot;", $phone->title ), " ")

You most likely do that because you want to have HTML-Entities inside the CSV file. 您最可能这样做是因为您想在CSV文件中包含HTML实体。 However, the CSV file does not need such. 但是,CSV文件不需要这样。 You also want to have the data in the CSV file as generic as possible. 您还希望CSV文件中的数据尽可能通用。 Whether the CSV file is used inside a HTML context later on or within a spreadsheet application should not be your concern when you convert the file-format. 转换文件格式时,您不必担心以后在HTML上下文中还是在电子表格应用程序中使用CSV文件。 My suggestion is here to leave that out and deal at another place with it. 我的建议是将其遗漏并在另一个地方处理。 A place this more belongs to, and that is later on, eg if you use the data from the CSV creating some HTML. 此位置所属的位置,以后,例如,如果您使用CSV中的数据创建了一些HTML,则该位置会出现。

That keeps your conversion and the data clean and it also removes detailed places in your processing which not only make the code more complicate but are very often a place where we introduce flaws into our programs. 这样可以保持转换和数据的整洁,还可以删除处理过程中的详细位置,这不仅使代码更加复杂,而且经常在我们向程序中引入缺陷的地方。

I for myself will just remove it from my example. 我自己将其从示例中删除。

So let's put this all together: Get all phones from all XML files and store the fields interested in into the output CSV file: 因此,让我们将所有这些放在一起:从所有XML文件中获取所有电话,并将感兴趣的字段存储到输出CSV文件中:

$files  = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);

$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);

foreach ($phones as $phone) {
    $output->fputcsv(
        [
            $phone->title,
            $phone->color,
        ]
    );
}

This then creates the output file you're looking for (without the HTML-entities): 然后,这将创建您要查找的输出文件(不包含HTML实体):

title,color
"""Apple iPhone 5S""",black
"Nokia Lumia 830",black

All this needs is the generator-function I've showed above already that in itself has also straight-forward code. 所有这些需求就是上面已经显示的生成器功能,它本身也具有简单的代码。 Everything else ships with PHP already. 其他所有内容都已随PHP一起提供。 Here is the example code in full: 这是完整的示例代码:

<?php
/**
 * @link http://stackoverflow.com/questions/26074850/convert-multiple-xml-files-to-csv-with-simplexml
 */

function extract_phones(Traversable $files)
{
    foreach ($files as $file) {
        $xml = simplexml_load_file($file);
        if ($xml === false) {
            continue;
        }
        foreach ($xml->phone as $phone) {
            yield $phone;
        }
    }
}

$files  = new GlobIterator(__DIR__ . '/*.xml');
$phones = extract_phones($files);

$output = new SplFileObject('file.csv', 'w');
$output->fputcsv($header = ["title", "color"]);

foreach ($phones as $phone) {
    $output->fputcsv(
        [
            $phone->title,
            $phone->color,
        ]
    );
}

echo file_get_contents($output->getFilename());

Thanks @Ghost for pointing me to the right direction. 感谢@Ghost为我指出正确的方向。 So here is my solution. 所以这是我的解决方案。

<?php

$filexml = array ('test.xml', 'test1.xml');


//Headers
$fp = fopen('file.csv', 'w');

$headers = array('title', 'color');
$converted_array = array_map("strtoupper", $headers);


fputcsv($fp, $converted_array, ',', '"');


//XML
foreach ($filexml as $file) {
    if (file_exists($file)) {
        $xml = simplexml_load_file($file);

        foreach ($xml->phone as $phone) {
        $values = array(
               "title" => (string)$phone->title = trim(str_replace ( "\"", "&quot;", $phone->title ), " "), 
               "color" => (string)$phone->color
            );
            fputcsv($fp, $values, ',', '"');
        }
        echo $file . ' converted to .csv sucessfully' . '<br>';
    } else {
        echo $file . ' was not found' . '<br>';
    }


}

fclose($fp);

?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM