繁体   English   中英

PHP-从多个MySQL查询创建XML并按日期排序

[英]PHP - Create XML from multiple MySQL queries and sort by date

我在MySQL数据库中有10-20个日志表。 每个表包含50-100.000行。 我需要将它们导出到XML并按创建日期对它们进行排序。

联合是一个很好的选择,因为表不包含相同的列(一个表可能包含3列,另外30列)。

这就是我创建XML的方式:

// Events
$stmt = $db->query("
  SELECT id, columnX, created
  FROM table1
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnX", $row['columnX']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

// Other events
$stmt = $db->query("
  SELECT id, columnY1, columnY2, columnY3, created
  FROM table2
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnY1", $row['columnY1']));
    $event->appendChild($xml->createElement("columnY2", $row['columnY2']));
    $event->appendChild($xml->createElement("columnY3", $row['columnY3']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

有人知道如何解决这个问题吗?

如果可以对所有查询进行排序,则可以通过从数据库中获取所有查询,然后像在下面的代码中那样将它们打印出来,对最终的XML进行排序。

请注意,此代码将可能一次消耗所有查询返回的数据的内存 ,因为在这种情况下,您不能使用无缓冲查询。 我不知道您所说的数据集有多大。

如果需要考虑内存,则可以使用相同的算法来组合任何数据源。 因此,您可以准备三个XML文件(每个查询)并将其合并,而不是合并SQL。 (与mysql非缓冲查询结合使用)内存使用情况可能会更好,但由于需要生成和解析XML,因此变慢了。

// convert queries to generator
function processQuery(mysqli $db, $sql) {
    $q = $db -> query($sql);
    while ($row = $q -> fetch_assoc()) {
        // just yield
        yield $row;
    }
}

// prepare all queries
$queries = [
    processQuery($db, "SELECT id, columnX, created FROM table1 ORDER BY created"),
    processQuery($db, "SELECT id, columnY1, columnY2, columnY3, created FROM table2 ORDER BY created"),
    processQuery($db, "SELECT id, created FROM table3 ORDER BY created"),
];

// run all queries and fetch first row
foreach ($queries as $query) {
    $query -> next(); // see \Generator
}

// now, we will run while any query has rows (generator returns key)
while (array_filter(array_map(function(Generator $query) { return $query -> key(); }, $queries))) {
    // now we have to find query, which next row has minimal date
    $minTimestamp = NULL;
    $queryWithMin = NULL;
    foreach ($queries as $queryId => $query) {
        $current = $query -> current();
        if ($current !== FALSE) {
            if ($minTimestamp === NULL || $minTimestamp > $current['created']) {
                // this query has row with lower date than previous queries
                $minTimestamp = $current['created'];
                $queryWithMin = $queryId;
            }
        }
    }
    // we now know, which query returns row with minimal date
    PRINT_TO_XML($queries[$queryWithMin] -> current());
    // move cursor of this query to next row
    $queries[$queryWithMin] -> next();
}

另一个方法是MySQL UNION,仅用于获取ID(已排序),然后分批处理它们。

 $q = $db -> query("SELECT 'table1' AS tableName, id, created FROM table1
 UNION ALL SELECT 'table2' AS tableName, id, created FROM table2
UNION ALL SELECT 'table3' AS tableName, id, created FROM table3
ORDER BY created");

$sorter = [];
while ($row = $q -> fetch_assoc()) {
    $sorter []= [$row['tableName'], $row['id']];
}

foreach (array_chunk($sorter, 5000) as $dataChunk) {
    // get ids from each table
    $table1Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table1'; }));
    $table2Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table2'; }));
    $table3Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table3'; }));
    // load full data from each table
    $dataTable1 = [];
    $q = $db -> query("SELECT * FROM table1 WHERE id IN (".implode(",", $table1Ids).")");
    while ($row = $q -> fetch_assoc()) {
        $dataTable1[$row['id']] = CREATE_XML($row);
    }
    // ... same with table2
    // ... same with table3
    // store
    foreach ($dataChunk as $row) {
        if ($row[0] === 'table1') {
            echo $dataTable1[$row[1]];
        }
        if ($row[1] === 'table1') {
            echo $dataTable2[$row[1]];
        }
        if ($row[2] === 'table1') {
            echo $dataTable3[$row[1]];
        }
    }
}

这种方法消耗的内存较少,但是在此精确代码中,您需要首先将所有ID加载到内存中。 可以简单地重写以在第一个循环中生成XML( if count($sorter) > 5000 { printXmlForIds($sorter); $sorter = []; } ),并且算法不会超过内存限制。

我建议使用INSERT INTO ... SELECT ... UNION ... SELECT构造将所有数据提取到(临时)表中。 INSERT INTO ... SELECT允许您将选择的结果直接插入表中。 UNION允许您合并SELECT结果。 由于它是数据库语句,因此所有操作都在DBMS中进行。

之后,使用select来获取按日期字段排序的数据,并使用XMLWriter创建XML。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM