简体   繁体   中英

Faster way to process xml file using PHP

I have this xml file named flight-itinerary.xml. A scaled-down version is shown below.

<itin line="1" dep="LOS" arr="ABV">
    <flt>
        <fltav>
            <cb>1</cb>
            <id>C</id>
            <av>10</av>
            <cur>NGN</cur>
            <CurInf>2,0.01,0.01</CurInf>
            <pri>15000.00</pri>
            <tax>30800.00</tax>
            <fav>1</fav>
            <miles></miles>
            <fid>11</fid>
            <finf>0,0,1</finf>

            <cb>2</cb>
            <id>J</id>
            <av>10</av>
            <cur>NGN</cur>
            <CurInf>2,0.01,0.01</CurInf>
            <pri>13000.00</pri>
            <tax>26110.00</tax>
            <fav>1</fav>
            <miles></miles>
            <fid>12</fid>
            <finf>0,0,0</finf>
        </fltav>
    </flt>
</itin>

The complete file contains 8 itinerary <itin> elements. The <fltav> element of each of the <itin> elements contains 11 of the <cb>1</cb> to <finf>0,0,1</finf> groups.

And below is the code I am using to process the file:

<?php

function processFlightsData()
{
    $data = array();
    $dom= new DOMDocument();
    $dom->load('flight-itinerary.xml');

    $classbands  = $dom->getElementsByTagName('classbands')->item(0);
    $bands       = $classbands->getElementsByTagName('band');
    $itineraries = $dom->getElementsByTagName('itin');
    $counter     = 0;

    foreach($itineraries AS $itinerary)
    { 
        $flt = $itinerary->getElementsByTagName('flt')->item(0);

        $dep = $flt->getElementsByTagName('dep')->item(0)->nodeValue;
        $arr = $flt->getElementsByTagName('arr')->item(0)->nodeValue;

        $time_data       = $flt->getElementsByTagName('time')->item(0);
        $departure_day   = $time_data->getElementsByTagName('ddaylcl')->item(0)->nodeValue;
        $departure_time  = $time_data->getElementsByTagName('dtimlcl')->item(0)->nodeValue;
        $departure_date  = $departure_day. ' '. $departure_time;
        $arrival_day     = $time_data->getElementsByTagName('adaylcl')->item(0)->nodeValue;
        $arrival_time    = $time_data->getElementsByTagName('atimlcl')->item(0)->nodeValue;
        $arrival_date    = $arrival_day. ' '. $arrival_time;
        $flight_duration = $time_data->getElementsByTagName('duration')->item(0)->nodeValue;

        $flt_det       = $flt->getElementsByTagName('fltdet')->item(0);
        $airline_id    = $flt_det->getElementsByTagName('airid')->item(0)->nodeValue;
        $flt_no        = $flt_det->getElementsByTagName('fltno')->item(0)->nodeValue;
        $flight_number = $airline_id. $flt_no;
        $airline_type  = $flt_det->getElementsByTagName('eqp')->item(0)->nodeValue;
        $stops         = $flt_det->getElementsByTagName('stp')->item(0)->nodeValue;

        $av_data = $flt->getElementsByTagName('fltav')->item(0);

        $cbs     = iterator_to_array($av_data->getElementsByTagName('cb')); //11 entries
        $ids     = iterator_to_array($av_data->getElementsByTagName('id')); //ditto
        $seats   = iterator_to_array($av_data->getElementsByTagName('av')); //ditto
        $curr    = iterator_to_array($av_data->getElementsByTagName('cur')); //ditto
        $price   = iterator_to_array($av_data->getElementsByTagName('pri')); //ditto
        $tax     = iterator_to_array($av_data->getElementsByTagName('tax')); //ditto
        $miles   = iterator_to_array($av_data->getElementsByTagName('miles')); //ditto
        $fid     = iterator_to_array($av_data->getElementsByTagName('fid')); //ditto    

        $inner_counter = 0;

        for($i = 0; $i < count($ids); $i++)
        {
            $data[$counter][$inner_counter] = array
            (
                'flight_number'                   => $flight_number,
                'flight_duration'                 => $flight_duration, 
                'departure_date'                  => $departure_date,
                'departure_time'                  => substr($departure_time, 0, 5),
                'arrival_date'                    => $arrival_date,
                'arrival_time'                    => substr($arrival_time, 0, 5),
                'departure_airport_code'          => $dep,
                'departure_airport_location_name' => get_airport_data($dep, $data_key='location'),
                'arrival_airport_code'            => $arr,
                'arrival_airport_location_name'   => get_airport_data($arr, $data_key='location'),
                'stops'                           => $stops,
                'cabin_class'                     => $ids[$i]->nodeValue,
                'ticket_class'                    => $ids[$i]->nodeValue,
                'ticket_class_nicename'           => formate_ticket_class_name($ids[$i]->nodeValue),
                'available_seats'                 => $seats[$i]->nodeValue,
                'currency'                        => $curr[$i]->nodeValue,
                'price'                           => $price[$i]->nodeValue,
                'tax'                             => $tax[$i]->nodeValue,
                'miles'                           => $miles[$i]->nodeValue,
            );

            ++$inner_counter;
        }

    return $data;
}

?>

Now, the outer loop iterates 8 times for each <itin> element, and during each iteration of the outer loop, the inner loop iterates 11 times, resulting in a total of 88 iterations per pass and causing serious performance issues. What I am looking for is a faster method of processing the file. Any helps will be greatly appreciated.

I don't think the loop is the bottle-neck. You should check your operations that are called within the loop, get_airport_data and formate_ticket_class_name .

Trying your code (without the auxiliary operations) on a number of itin elements takes less than a second, check this fiddle: http://phpfiddle.org/main/code/7fpi-b3ka (Note that the XML might not be similar to yours, I've guessed a lot of elements that were missing).

If there are operations that are called which increases the processing time substantially, try to call the operation with bulk data or cache the responses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM