简体   繁体   English

PHP-从多个MySQL查询创建XML并按日期排序

[英]PHP - Create XML from multiple MySQL queries and sort by date

I have 10-20 log-tables in a MySQL database. 我在MySQL数据库中有10-20个日志表。 Each table contains 50-100.000 rows. 每个表包含50-100.000行。 I need to export these to XML and sort them by creation date. 我需要将它们导出到XML并按创建日期对它们进行排序。

Union is a good option as the tables doesn't contain the same columns (one table might contain 3 column, and another 30 columns). 联合是一个很好的选择,因为表不包含相同的列(一个表可能包含3列,另外30列)。

This is how I create the XML: 这就是我创建XML的方式:

// Events
$stmt = $db->query("
  SELECT id, columnX, created
  FROM table1
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnX", $row['columnX']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

// Other events
$stmt = $db->query("
  SELECT id, columnY1, columnY2, columnY3, created
  FROM table2
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnY1", $row['columnY1']));
    $event->appendChild($xml->createElement("columnY2", $row['columnY2']));
    $event->appendChild($xml->createElement("columnY3", $row['columnY3']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

Anyone got an idea of how to solve this? 有人知道如何解决这个问题吗?

If there is possibility to sort all queries, you are able to sort final XML by getting all queries from database and then printing out them like in code bellow. 如果可以对所有查询进行排序,则可以通过从数据库中获取所有查询,然后像在下面的代码中那样将它们打印出来,对最终的XML进行排序。

Be aware, that this code WILL probably consume as much memory as data returned by all queries in one time , because you cannot use unbuffered query in this case. 请注意,此代码将可能一次消耗所有查询返回的数据的内存 ,因为在这种情况下,您不能使用无缓冲查询。 I don't know, how big are datasets, you are talking about. 我不知道您所说的数据集有多大。

If memory would be your concern, you can use same algorithm to combine any data source. 如果需要考虑内存,则可以使用相同的算法来组合任何数据源。 So you can prepare three XML files (per query) and combine these instead of combining SQL. 因此,您可以准备三个XML文件(每个查询)并将其合并,而不是合并SQL。 It would be (in combination with mysql unbuffered queries) probably better variant for memory usage, but slower as you will need generate and parse XML. (与mysql非缓冲查询结合使用)内存使用情况可能会更好,但由于需要生成和解析XML,因此变慢了。

// convert queries to generator
function processQuery(mysqli $db, $sql) {
    $q = $db -> query($sql);
    while ($row = $q -> fetch_assoc()) {
        // just yield
        yield $row;
    }
}

// prepare all queries
$queries = [
    processQuery($db, "SELECT id, columnX, created FROM table1 ORDER BY created"),
    processQuery($db, "SELECT id, columnY1, columnY2, columnY3, created FROM table2 ORDER BY created"),
    processQuery($db, "SELECT id, created FROM table3 ORDER BY created"),
];

// run all queries and fetch first row
foreach ($queries as $query) {
    $query -> next(); // see \Generator
}

// now, we will run while any query has rows (generator returns key)
while (array_filter(array_map(function(Generator $query) { return $query -> key(); }, $queries))) {
    // now we have to find query, which next row has minimal date
    $minTimestamp = NULL;
    $queryWithMin = NULL;
    foreach ($queries as $queryId => $query) {
        $current = $query -> current();
        if ($current !== FALSE) {
            if ($minTimestamp === NULL || $minTimestamp > $current['created']) {
                // this query has row with lower date than previous queries
                $minTimestamp = $current['created'];
                $queryWithMin = $queryId;
            }
        }
    }
    // we now know, which query returns row with minimal date
    PRINT_TO_XML($queries[$queryWithMin] -> current());
    // move cursor of this query to next row
    $queries[$queryWithMin] -> next();
}

Another aproach could be MySQL UNION only for getting ids (already sorted) and then process them in batches. 另一个方法是MySQL UNION,仅用于获取ID(已排序),然后分批处理它们。

 $q = $db -> query("SELECT 'table1' AS tableName, id, created FROM table1
 UNION ALL SELECT 'table2' AS tableName, id, created FROM table2
UNION ALL SELECT 'table3' AS tableName, id, created FROM table3
ORDER BY created");

$sorter = [];
while ($row = $q -> fetch_assoc()) {
    $sorter []= [$row['tableName'], $row['id']];
}

foreach (array_chunk($sorter, 5000) as $dataChunk) {
    // get ids from each table
    $table1Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table1'; }));
    $table2Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table2'; }));
    $table3Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table3'; }));
    // load full data from each table
    $dataTable1 = [];
    $q = $db -> query("SELECT * FROM table1 WHERE id IN (".implode(",", $table1Ids).")");
    while ($row = $q -> fetch_assoc()) {
        $dataTable1[$row['id']] = CREATE_XML($row);
    }
    // ... same with table2
    // ... same with table3
    // store
    foreach ($dataChunk as $row) {
        if ($row[0] === 'table1') {
            echo $dataTable1[$row[1]];
        }
        if ($row[1] === 'table1') {
            echo $dataTable2[$row[1]];
        }
        if ($row[2] === 'table1') {
            echo $dataTable3[$row[1]];
        }
    }
}

This approach is less memory consuming, but in this exact code, you will need to load all IDs to memory first. 这种方法消耗的内存较少,但是在此精确代码中,您需要首先将所有ID加载到内存中。 It's possible to simple rewrite to generate XML in first loop ( if count($sorter) > 5000 { printXmlForIds($sorter); $sorter = []; } ) and algorithm would not exceed memory limt. 可以简单地重写以在第一个循环中生成XML( if count($sorter) > 5000 { printXmlForIds($sorter); $sorter = []; } ),并且算法不会超过内存限制。

I suggest using an INSERT INTO ... SELECT ... UNION ... SELECT construct to fetch all the data into a (temporary) table. 我建议使用INSERT INTO ... SELECT ... UNION ... SELECT构造将所有数据提取到(临时)表中。 INSERT INTO ... SELECT allows you to directly insert the result of an select into a table. INSERT INTO ... SELECT允许您将选择的结果直接插入表中。 UNION allows you to concat SELECT results. UNION允许您合并SELECT结果。 Because it is a database statement it all happens in the DBMS. 由于它是数据库语句,因此所有操作都在DBMS中进行。

After that use a select to fetch the data ordered by date field and use XMLWriter to create the XML. 之后,使用select来获取按日期字段排序的数据,并使用XMLWriter创建XML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM