简体   繁体   中英

wordpress - Insert posts programmatically while maintaining links

I am currently working on a migration script to insert articles from XML into Wordpress.

So far I parsed the XML and created arrays in PHP, I am looping through these arrays and insert them all one by one into Wordpress with the following code:

$post = array(
            'post_title'    => wp_strip_all_tags($article['title']),
            'post_content'  => $article['description'],
            'post_status'   => 'publish',
            'post_author'   => 1,
            'ping_status'   => 'closed',
            'post_date'     => $dateTime->format('Y-m-d H:i:s'),
            'post_type'     => $post_type
        );

        $result = wp_insert_post($post);

That all goes well, however here comes the issue: the XML's are an export from a website (unfortunately I do not know which CMS ) and in the content there can be links to files on the same site, for example:

<![CDATA[<p><strong>Shortcuts:</strong></p>
<p/>
<ul>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/elakkeen_hakeminen_ulkomailta">(Booklet in Finnish)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/sa_har_soker_du_pension_fran_utlandet">(Booklet in Swedish)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/pensioni_taotlemine_valismaalt">(Booklet in Estonian)</a> 
</li>
<li><a href="http://www.testsite.fi/julkaisut/5440/julkaisut?contentPath=fi/julkaisut/esitteet/poluchenie_pensii_iz_drugih_stran">(Booklet in Russian)</a> 
</li>
</ul>]]>

Testsite.fi is my own site, so these are internal links.

Those links are referring to PDF's and this should be inserted into wordpress, but obviously the links will be different. I do have the PDF's that are being referred to ( for example: elakkeen_hakeminen_ulkomailta.pdf, and they are in same folder as this script is ) so all that is required is to upload this file in Wordpress programmatically or manually move it to the correct location, and then update the links so that it still works.

Any clue how to do this? I am guessing something with regular expressions, but can't really figure it out.

To change all internal links you can use this:

$content = preg_replace('%href="http://www\.testsite\.fi/(.*)"%', 'href="' get_bloginfo('wpurl') . '/$1"', $article['description'], -1);

$post = array(
    'post_title'    => wp_strip_all_tags($article['title']),
    'post_content'  => $content,
    'post_status'   => 'publish',
    'post_author'   => 1,
    'ping_status'   => 'closed',
    'post_date'     => $dateTime->format('Y-m-d H:i:s'),
    'post_type'     => $post_type
);

$result = wp_insert_post($post);

Since the pdfs in your example do not have a filetype they can't be identified programmatically. Otherwise it would be something along the lines of:

$upload_dir = wp_upload_dir();
$content = preg_replace('%href="http://www\.testsite\.fi/(.*)/(.*).pdf"%', 'href="' . $upload_dir['url'] . '/$2.pdf"', $article['description'], -1);

where $2 is the filename for the pdf.

Note:

The href part in the regex is not neccesary but assures that you are not changing urls that are not inside a href atrribute. Depending on the scenario you can leave that part out.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM