简体   繁体   English

wordpress-以编程方式插入帖子,同时保持链接

[英]wordpress - Insert posts programmatically while maintaining links

I am currently working on a migration script to insert articles from XML into Wordpress. 我目前正在研究将XML中的文章插入Wordpress的迁移脚本。

So far I parsed the XML and created arrays in PHP, I am looping through these arrays and insert them all one by one into Wordpress with the following code: 到目前为止,我已经解析了XML并在PHP中创建了数组,我正在遍历这些数组,并使用以下代码将它们全部一一插入到Wordpress中:

$post = array(
            'post_title'    => wp_strip_all_tags($article['title']),
            'post_content'  => $article['description'],
            'post_status'   => 'publish',
            'post_author'   => 1,
            'ping_status'   => 'closed',
            'post_date'     => $dateTime->format('Y-m-d H:i:s'),
            'post_type'     => $post_type
        );

        $result = wp_insert_post($post);

That all goes well, however here comes the issue: the XML's are an export from a website (unfortunately I do not know which CMS ) and in the content there can be links to files on the same site, for example: 一切顺利,但是问题来了:XML是从网站导出的(不幸的是,我不知道哪个CMS),并且内容中可以有指向同一站点上文件的链接,例如:

<![CDATA[<p><strong>Shortcuts:</strong></p>
<p/>
<ul>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/elakkeen_hakeminen_ulkomailta">(Booklet in Finnish)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/sa_har_soker_du_pension_fran_utlandet">(Booklet in Swedish)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/pensioni_taotlemine_valismaalt">(Booklet in Estonian)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/poluchenie_pensii_iz_drugih_stran">(Booklet in Russian)</a> 
</li>
</ul>]]>

Testsite.fi is my own site, so these are internal links. Testsite.fi是我自己的网站,所以这些是内部链接。

Those links are referring to PDF's and this should be inserted into wordpress, but obviously the links will be different. 这些链接是指PDF的链接,应该将其插入到wordpress中,但是显然链接会有所不同。 I do have the PDF's that are being referred to ( for example: elakkeen_hakeminen_ulkomailta.pdf, and they are in same folder as this script is ) so all that is required is to upload this file in Wordpress programmatically or manually move it to the correct location, and then update the links so that it still works. 我确实有要引用的PDF文件(例如:elakkeen_hakeminen_ulkomailta.pdf,它们与该脚本位于同一文件夹中),因此所需要做的就是以编程方式将此文件上传到Wordpress中或手动将其移动到正确的位置,然后更新链接,使其仍然有效。

Any clue how to do this? 任何线索如何做到这一点? I am guessing something with regular expressions, but can't really figure it out. 我正在猜测带有正则表达式的内容,但无法真正弄清楚。

To change all internal links you can use this: 要更改所有内部链接,可以使用以下命令:

$content = preg_replace('%href="http://www\.testsite\.fi/(.*)"%', 'href="' get_bloginfo('wpurl') . '/$1"', $article['description'], -1);

$post = array(
    'post_title'    => wp_strip_all_tags($article['title']),
    'post_content'  => $content,
    'post_status'   => 'publish',
    'post_author'   => 1,
    'ping_status'   => 'closed',
    'post_date'     => $dateTime->format('Y-m-d H:i:s'),
    'post_type'     => $post_type
);

$result = wp_insert_post($post);

Since the pdfs in your example do not have a filetype they can't be identified programmatically. 由于示例中的pdf没有文件类型,因此无法通过编程方式进行标识。 Otherwise it would be something along the lines of: 否则,可能会导致以下问题:

$upload_dir = wp_upload_dir();
$content = preg_replace('%href="http://www\.testsite\.fi/(.*)/(.*).pdf"%', 'href="' . $upload_dir['url'] . '/$2.pdf"', $article['description'], -1);

where $2 is the filename for the pdf. 其中$2是pdf的文件名。

Note: 注意:

The href part in the regex is not neccesary but assures that you are not changing urls that are not inside a href atrribute. 正则表达式中的href部分不是必需的,但可以确保您不会更改不在href属性内的URL。 Depending on the scenario you can leave that part out. 根据情况,您可以省略该部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM