简体   繁体   English

在 Wordpress 外部生成站点地图

[英]Generating Sitemap External to Wordpress

So, i've built an "external to wordpress" sitemap generator using PDO to query the wordpress database.因此,我使用 PDO 构建了一个“wordpress 外部”站点地图生成器来查询 wordpress 数据库。

I have found (after a month) an issue with it tho that I am completely stumped on.我发现(一个月后)一个问题,我完全被难住了。 Parent -> Children links in it.父 -> 子链接在其中。 And of course this is generating 404's when on spidering because officially, the child pages don' exist.当然,这是在爬行时生成 404 的,因为正式地,子页面不存在。

The sites are setup with permalinks as /%postname%/这些站点的永久链接设置为/%postname%/

And the code is:代码是:

<?php

require dirname(__FILE__) . '/wp-config.php';
define('SITE_URL' , 'http://www.theactualdomain.com/'); // Change this for the site to pull this for

header('Content-Type: text/xml');

echo GenerateSiteMap();

function GenerateSiteMap(){
    global $table_prefix;
    $dsn = array('mysql:host=' . DB_HOST . ';dbname=' . DB_NAME, DB_USER, DB_PASSWORD);
    $dbh = new PDO($dsn[0], $dsn[1], $dsn[2]);
    $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
    $dbh->setAttribute(PDO::ATTR_PERSISTENT, true);
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);  
    // Pages
    $sql = "SELECT a.`post_name` As Slug, a.`post_date` As Date FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'page'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs0 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Posts    
    $sql = "SELECT a.`post_name` As Slug, a.`post_date` As Date FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'post'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs1 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Categories
    $sql = "Select b.`slug` As Slug, NOW() As Date From `".$table_prefix."term_taxonomy` a Inner Join `".$table_prefix."terms` b On b.`term_id` = a.`term_id` Where a.`taxonomy` = 'category' AND a.`count` > 0";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs2 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Tags
    $sql = "Select b.`slug` As Slug, NOW() As Date From `".$table_prefix."term_taxonomy` a Inner Join `".$table_prefix."terms` b On b.`term_id` = a.`term_id` Where a.`taxonomy` = 'post_tag' AND a.`count` > 0";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs3 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Archives
    $sql = "SELECT DISTINCT CONCAT(DATE_FORMAT(`post_date`, '%Y'), '/', DATE_FORMAT(`post_date`, '%m')) As Slug, NOW() As Date FROM `".$table_prefix."posts` WHERE `post_status` = 'publish' AND `post_type` = 'post'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs4 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    $dbh = null;$dsn = null;$sql = null;$stmnt1 = null;
    unset($dbh, $dsn, $stmnt1, $sql);
    // Home Page and Sitemap Starter
    $ret = '<?xml version="1.0" encoding="UTF-8"?>
                <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" 
                        xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">    
                    <url>
                        <loc>' . SITE_URL . '</loc>
                        <lastmod>' . date('c',time()) . '</lastmod>
                        <changefreq>daily</changefreq>
                        <priority>1.0</priority>
                    </url>';
    // Pages
    $ret .= FormatMapping($rs0, 'weekly', '1.0', 4);
    unset($rs0);
    // Posts
    $ret .= FormatMapping($rs1, 'weekly', '0.9', 0);
    unset($rs1);
    // Categories
    $ret .= FormatMapping($rs2, 'daily', '0.8', 1);
    unset($rs2);
    // Tags
    $ret .= FormatMapping($rs3, 'daily', '0.8', 2);
    unset($rs3);
    // Archives
    $ret .= FormatMapping($rs4, 'monthly', '0.7', 3);
    unset($rs4);
    $ret .= '</urlset>';
    return $ret;
}

function FormatMapping($rs, $cf = 'daily', $priority = '0.5', $type = 0){
    $ret = '';
    $url = '';
    $rCt = count($rs);
    for($i = 0; $i < $rCt; ++$i){
        $ret .= '<url>';
        switch($type){
            case 0: // Posts
            case 3: // Archives
            case 4: // Pages
                $url = SITE_URL . $rs[$i]['Slug'];
                break;  
            case 1: // Categories
                $url = SITE_URL . 'category/' . $rs[$i]['Slug'];
                break;
            case 2: // Tags
                $url = SITE_URL . 'tag/' . $rs[$i]['Slug'];
                break;
        }
        $ret .= '   <loc>' . $url . '</loc>
                    <lastmod>' . date('c', strtotime($rs[$i]['Date'])) . '</lastmod>
                    <changefreq>' . $cf . '</changefreq>
                    <priority>' . $priority . '</priority>
                </url>';
    }
    unset($rs);
    return $ret;
}

How can I set that initial Page query up to render the links properly?如何设置初始Page查询以正确呈现链接?

No, using the Wordpress object is not a solution.不,使用 Wordpress 对象不是解决方案。 The whole point is to lessen the load on already heavy traffic sites, when applying the sitemap.重点是在应用站点地图时减轻流量已经很大的站点的负载。

If I understand you correctly, your child page URLs are incorrect, because they should be of the form <SITE_URL><parent_slug><child_slug> , and your code is just returning <SITE_URL><child_slug> .如果我理解正确,您的子页面 URL 是不正确的,因为它们的格式应该是<SITE_URL><parent_slug><child_slug> ,而您的代码只是返回<SITE_URL><child_slug> To get the full URL, you need to use the post_parent column of the wp_posts table and recursively build up the URL (as there could be several levels of nesting).要获得完整的 URL,您需要使用wp_posts表的post_parent列并递归构建 URL(因为可能有多个嵌套级别)。

The following changes to your code seem to work:对您的代码进行以下更改似乎有效:

Replace代替

// Pages
$sql = "SELECT a.`post_name` As Slug, a.`post_date` As Date FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'page'";

with

// Pages
$sql = "SELECT a.`id`, a.`post_name` As Slug, a.`post_date` As Date, a.`post_parent` FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'page'";

That gives you page and parent IDs, which you need to make the association.这为您提供了建立关联所需的页面和父 ID。

Replace代替

$rs0 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);

with

$rs0 = array_map('reset', $stmnt1->fetchAll(PDO::FETCH_GROUP|PDO::FETCH_ASSOC));

See PDO fetchAll() primary key as array group key for an explanation of that one, but it makes the keys of the $rs0 array the page IDs, which makes looking up the parent's slug much easier.请参阅PDO fetchAll() 主键作为数组组键以获取对该的解释,但它使$rs0数组的键成为页面 ID,这使得查找父项的 slug 变得更加容易。

Add the function to (recursively) get the ancestor slugs:将函数添加到(递归)获取祖先 slugs:

function GetPageSlug($rs, $page_id) {
    if ($page_id == 0) {
        return '';
    }

    return GetPageSlug($rs, $rs[$page_id]['post_parent']) . $rs[$page_id]['Slug'] . '/';
}

Replace代替

$rCt = count($rs);
for($i = 0; $i < $rCt; ++$i){

with

foreach($rs as $i => $element) {

Since the IDs are page IDs, we can't assume they go from 0 to count($rs) .由于 ID 是页面 ID,我们不能假设它们从 0 到count($rs) I haven't changed the code to use $element instead of $rs[$i] , but you could.我没有更改代码以使用$element而不是$rs[$i] ,但您可以。 I wanted to keep my changes to a minimum.我想将我的更改保持在最低限度。

Finally, replace最后,更换

case 4: // Pages
    $url = SITE_URL . $rs[$i]['Slug'];

with

case 4: // Pages
    $url = SITE_URL . GetPageSlug($rs, $rs[$i]['post_parent']) . $rs[$i]['Slug'];

The final code:最终代码:

<?php

require dirname(__FILE__) . '/wp-config.php';
define('SITE_URL' , 'http://www.theactualdomain.com/'); // Change this for the site to pull this for

header('Content-Type: text/xml');

echo GenerateSiteMap();

function GenerateSiteMap(){
    global $table_prefix;
    $dsn = array('mysql:host=' . DB_HOST . ';dbname=' . DB_NAME, DB_USER, DB_PASSWORD);
    $dbh = new PDO($dsn[0], $dsn[1], $dsn[2]);
    $dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
    $dbh->setAttribute(PDO::ATTR_PERSISTENT, true);
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);  
    // Pages
    $sql = "SELECT a.`id`, a.`post_name` As Slug, a.`post_date` As Date, a.`post_parent` FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'page'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs0 = array_map('reset', $stmnt1->fetchAll(PDO::FETCH_GROUP|PDO::FETCH_ASSOC)); # https://stackoverflow.com/questions/15754461/pdo-fetchall-primary-key-as-array-group-key
    // Posts    
    $sql = "SELECT a.`post_name` As Slug, a.`post_date` As Date FROM `".$table_prefix."posts` a WHERE a.`post_status` = 'publish' AND a.`post_type` = 'post'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs1 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Categories
    $sql = "Select b.`slug` As Slug, NOW() As Date From `".$table_prefix."term_taxonomy` a Inner Join `".$table_prefix."terms` b On b.`term_id` = a.`term_id` Where a.`taxonomy` = 'category' AND a.`count` > 0";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs2 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Tags
    $sql = "Select b.`slug` As Slug, NOW() As Date From `".$table_prefix."term_taxonomy` a Inner Join `".$table_prefix."terms` b On b.`term_id` = a.`term_id` Where a.`taxonomy` = 'post_tag' AND a.`count` > 0";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs3 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    // Archives
    $sql = "SELECT DISTINCT CONCAT(DATE_FORMAT(`post_date`, '%Y'), '/', DATE_FORMAT(`post_date`, '%m')) As Slug, NOW() As Date FROM `".$table_prefix."posts` WHERE `post_status` = 'publish' AND `post_type` = 'post'";
    $stmnt1 = $dbh->prepare($sql);
    $stmnt1->execute();
    $rs4 = $stmnt1->fetchAll(PDO::FETCH_ASSOC);
    $dbh = null;$dsn = null;$sql = null;$stmnt1 = null;
    unset($dbh, $dsn, $stmnt1, $sql);
    // Home Page and Sitemap Starter
    $ret = '<?xml version="1.0" encoding="UTF-8"?>
                <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                        xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" 
                        xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">    
                    <url>
                        <loc>' . SITE_URL . '</loc>
                        <lastmod>' . date('c',time()) . '</lastmod>
                        <changefreq>daily</changefreq>
                        <priority>1.0</priority>
                    </url>';
    // Pages
    $ret .= FormatMapping($rs0, 'weekly', '1.0', 4);
    unset($rs0);
    // Posts
    $ret .= FormatMapping($rs1, 'weekly', '0.9', 0);
    unset($rs1);
    // Categories
    $ret .= FormatMapping($rs2, 'daily', '0.8', 1);
    unset($rs2);
    // Tags
    $ret .= FormatMapping($rs3, 'daily', '0.8', 2);
    unset($rs3);
    // Archives
    $ret .= FormatMapping($rs4, 'monthly', '0.7', 3);
    unset($rs4);
    $ret .= '</urlset>';
    return $ret;
}

function GetPageSlug($rs, $page_id) {
    if ($page_id == 0) {
        return '';
    }

    return GetPageSlug($rs, $rs[$page_id]['post_parent']) . $rs[$page_id]['Slug'] . '/';
}

function FormatMapping($rs, $cf = 'daily', $priority = '0.5', $type = 0){
    $ret = '';
    $url = '';
    foreach($rs as $i => $element) {
        $ret .= '<url>';
        switch($type){
            case 0: // Posts
            case 3: // Archives
            case 4: // Pages
                $url = SITE_URL . GetPageSlug($rs, $rs[$i]['post_parent']) . $rs[$i]['Slug'];
                break;  
            case 1: // Categories
                $url = SITE_URL . 'category/' . $rs[$i]['Slug'];
                break;
            case 2: // Tags
                $url = SITE_URL . 'tag/' . $rs[$i]['Slug'];
                break;
        }
        $ret .= '   <loc>' . $url . '</loc>
                    <lastmod>' . date('c', strtotime($rs[$i]['Date'])) . '</lastmod>
                    <changefreq>' . $cf . '</changefreq>
                    <priority>' . $priority . '</priority>
                </url>';
    }
    unset($rs);
    return $ret;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM