簡體   English   中英

PHP-從多個MySQL查詢創建XML並按日期排序

[英]PHP - Create XML from multiple MySQL queries and sort by date

我在MySQL數據庫中有10-20個日志表。 每個表包含50-100.000行。 我需要將它們導出到XML並按創建日期對它們進行排序。

聯合是一個很好的選擇,因為表不包含相同的列(一個表可能包含3列,另外30列)。

這就是我創建XML的方式:

// Events
$stmt = $db->query("
  SELECT id, columnX, created
  FROM table1
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnX", $row['columnX']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

// Other events
$stmt = $db->query("
  SELECT id, columnY1, columnY2, columnY3, created
  FROM table2
");
$row_count = $stmt->rowCount();
if ($row_count != '0') {
  while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    $event = $xml->createElement("event");
    $events->appendChild($event);
    $event->appendChild($xml->createElement("ID", "XXXX"));
    $event->appendChild($xml->createElement("columnY1", $row['columnY1']));
    $event->appendChild($xml->createElement("columnY2", $row['columnY2']));
    $event->appendChild($xml->createElement("columnY3", $row['columnY3']));
    $event->appendChild($xml->createElement("created", $row['created']));
  }
}

有人知道如何解決這個問題嗎?

如果可以對所有查詢進行排序,則可以通過從數據庫中獲取所有查詢,然后像在下面的代碼中那樣將它們打印出來,對最終的XML進行排序。

請注意,此代碼將可能一次消耗所有查詢返回的數據的內存 ,因為在這種情況下,您不能使用無緩沖查詢。 我不知道您所說的數據集有多大。

如果需要考慮內存,則可以使用相同的算法來組合任何數據源。 因此,您可以准備三個XML文件(每個查詢)並將其合並,而不是合並SQL。 (與mysql非緩沖查詢結合使用)內存使用情況可能會更好,但由於需要生成和解析XML,因此變慢了。

// convert queries to generator
function processQuery(mysqli $db, $sql) {
    $q = $db -> query($sql);
    while ($row = $q -> fetch_assoc()) {
        // just yield
        yield $row;
    }
}

// prepare all queries
$queries = [
    processQuery($db, "SELECT id, columnX, created FROM table1 ORDER BY created"),
    processQuery($db, "SELECT id, columnY1, columnY2, columnY3, created FROM table2 ORDER BY created"),
    processQuery($db, "SELECT id, created FROM table3 ORDER BY created"),
];

// run all queries and fetch first row
foreach ($queries as $query) {
    $query -> next(); // see \Generator
}

// now, we will run while any query has rows (generator returns key)
while (array_filter(array_map(function(Generator $query) { return $query -> key(); }, $queries))) {
    // now we have to find query, which next row has minimal date
    $minTimestamp = NULL;
    $queryWithMin = NULL;
    foreach ($queries as $queryId => $query) {
        $current = $query -> current();
        if ($current !== FALSE) {
            if ($minTimestamp === NULL || $minTimestamp > $current['created']) {
                // this query has row with lower date than previous queries
                $minTimestamp = $current['created'];
                $queryWithMin = $queryId;
            }
        }
    }
    // we now know, which query returns row with minimal date
    PRINT_TO_XML($queries[$queryWithMin] -> current());
    // move cursor of this query to next row
    $queries[$queryWithMin] -> next();
}

另一個方法是MySQL UNION,僅用於獲取ID(已排序),然后分批處理它們。

 $q = $db -> query("SELECT 'table1' AS tableName, id, created FROM table1
 UNION ALL SELECT 'table2' AS tableName, id, created FROM table2
UNION ALL SELECT 'table3' AS tableName, id, created FROM table3
ORDER BY created");

$sorter = [];
while ($row = $q -> fetch_assoc()) {
    $sorter []= [$row['tableName'], $row['id']];
}

foreach (array_chunk($sorter, 5000) as $dataChunk) {
    // get ids from each table
    $table1Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table1'; }));
    $table2Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table2'; }));
    $table3Ids = array_map(function($rowInfo) { return $rowInfo[1]; }, array_filter($dataChunk, function($rowInfo) { return $rowInfo[0] === 'table3'; }));
    // load full data from each table
    $dataTable1 = [];
    $q = $db -> query("SELECT * FROM table1 WHERE id IN (".implode(",", $table1Ids).")");
    while ($row = $q -> fetch_assoc()) {
        $dataTable1[$row['id']] = CREATE_XML($row);
    }
    // ... same with table2
    // ... same with table3
    // store
    foreach ($dataChunk as $row) {
        if ($row[0] === 'table1') {
            echo $dataTable1[$row[1]];
        }
        if ($row[1] === 'table1') {
            echo $dataTable2[$row[1]];
        }
        if ($row[2] === 'table1') {
            echo $dataTable3[$row[1]];
        }
    }
}

這種方法消耗的內存較少,但是在此精確代碼中,您需要首先將所有ID加載到內存中。 可以簡單地重寫以在第一個循環中生成XML( if count($sorter) > 5000 { printXmlForIds($sorter); $sorter = []; } ),並且算法不會超過內存限制。

我建議使用INSERT INTO ... SELECT ... UNION ... SELECT構造將所有數據提取到(臨時)表中。 INSERT INTO ... SELECT允許您將選擇的結果直接插入表中。 UNION允許您合並SELECT結果。 由於它是數據庫語句,因此所有操作都在DBMS中進行。

之后,使用select來獲取按日期字段排序的數據,並使用XMLWriter創建XML。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM