简体   繁体   English

preg_match查找并替换字符串模式

[英]preg_match find and replace a string pattern

I have a wordpress database which has some embeded iframes from sound cloud. 我有一个wordpress数据库,它有一些嵌入来自声音云的iframe。 I want the the iframes to be replaced with some sort of shortcode. 我希望用某种短代码替换iframe。 I have even created a shortcode and it works pretty good. 我甚至创建了一个短代码,它的效果非常好。

The problem is that i have an old database with approx 2000 posts which has already embeded codes. 问题是我有一个旧的数据库,大约有2000个帖子,已经嵌入了代码。 What i want to do is to write a code so that it would replace the iframe with the shortcode. 我想要做的是编写一个代码,以便用短代码替换iframe。

Here is the code which i am using to find the url from the content but it always returns blank. 这是我用来从内容中找到网址的代码,但它总是返回空白。

$string = 'Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson\'s retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad\'s record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you\'ve ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>\'s Jeffree\'s sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F8087281&amp;color=000000&amp;auto_play=false&amp;show_artwork=true" frameborder="no" scrolling="no" width="100%" height="350"></iframe>';

preg_match("/url=(.*?)/", $string, $matches);

print_r($matches);

The above code doesn't work and i am not so familiar with regex so if any one can figure out what is wrong here then it would be great. 上面的代码不起作用,我不熟悉正则表达式,所以如果有人能够弄清楚这里有什么问题那么它会很棒。 And also if anyone can guide me the right process to do this then that would be great. 而且,如果有人能指导我做正确的过程,那就太棒了。

Since you're working with HTML here, I would recommend using DOM functions: 由于您在这里使用HTML,我建议使用DOM函数:

$doc = new DOMDocument;
$doc->loadHTML($string);

foreach ($doc->getElementsByTagName('iframe') as $iframe) {
    $url = $iframe->getAttribute('src');
    // parse the query string
    parse_str(parse_url($url, PHP_URL_QUERY), $args);
    // save the modified attribute
    $iframe->setAttribute('src', $args['url']);
}

echo $doc->saveHTML();

This outputs the full document, so you would need to trim it down: 这会输出完整的文档,因此您需要将其修剪:

$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
    echo $doc->saveHTML($node);
}

Output: 输出:

<p>Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson's retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad's record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you've ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>'s Jeffree's sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="http://api.soundcloud.com/playlists/8087281" frameborder="no" scrolling="no" width="100%" height="350"></iframe></p>

This should do for what you've specified 这应该适用于您指定的内容

$new_string = preg_replace('/(?:<iframe[^\>]+src="[^\"]*url=([^\"]*soundcloud\.com[^\"]*))"[^\/]*\/[^\>]*>/i', '[soundcloud url="$1"]', $string);

It's limited to iframes with the url=…soundcloud… part in the src attribute and replaces the entire iframe code with [soundcloud url="{part after url=}"] 它仅限于带有url = ... soundcloud ...的iframe,在src属性中,并用[soundcloud url =“{part after url =}”替换整个iframe代码]

For a one-time fix, you might consider an SQL solution. 对于一次性修复,您可以考虑使用SQL解决方案。 Some assumptions with the following SQL: 使用以下SQL的一些假设:

  • There is only ONE iframe per post to be replaced (the SQL can be run multiple times if there are posts with more than one iframe). 每个帖子只有一个iframe需要替换(如果有多个iframe的帖子,则可以多次运行SQL)。
  • The iframes to be replaced ALL are in the form: 要替换ALL的iframe采用以下形式:

<iframe src="https://w.soundcloud.com/player/?url="..." other-stuff</iframe>

  • All you care about is what's between the quotes for the url parameter 您所关心的只是url参数的引号之间的内容
  • The end result is [soundcloud url="..."] 最终结果是[soundcloud url =“...”]

If all of this is true, then the following SQL should do the trick. 如果所有这些都是真的,那么下面的SQL应该可以解决问题。 It can be tweaked if you want a different shortcode, etc. 如果你想要一个不同的短代码等,它可以调整。

Be sure to backup your wp_posts table before performing ANY mass update. 在执行任何批量更新之前,请务必备份您的wp_posts表。

CREATE TABLE wp_posts_backup SELECT * FROM wp_posts
;

Once the backup is complete, the following SQL should fix all of your posts in a single shot: 备份完成后,以下SQL应该一次修复所有帖子:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

I would suggest you TEST a few posts before running this against all of them. 我会建议你在针对所有帖子运行之前测试一些帖子。 An easy way to test would be to add the following to the WHERE clause above (immediately before ';') changing '?' 一种简单的测试方法是将以下内容添加到上面的WHERE子句中(紧接在';'之前),更改'?' to the Post ID(s) to be tested. 到要测试的帖子ID。

AND p.ID IN (?,?,?)

If for any reason you need to restore your posts, you can do something like: 如果您因任何原因需要恢复帖子,可以执行以下操作:

UPDATE wp_posts p
  JOIN wp_posts_backup b
    ON b.ID = p.ID
   SET p.post_content = b.post_content
;

One other thing to consider. 还有一件事需要考虑。 I wasn't sure if you wanted to pass on the parameters that are currently a part of the url, so I included them. 我不确定你是否想要传递当前属于url的参数,所以我把它们包括在内。 You can easily remove them by changing: 您可以通过更改以下内容轻松删除它

                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'

to: 至:

                           ,'"]'

resulting in: 导致:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Updated to allow for no parameters in the url 已更新,以允许网址中没有参数

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX(
                                     SUBSTRING_INDEX( SUBSTR( p.post_content
                                                            , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                            )
                                                    , '&amp;', 1
                                                    )
                                                , '"', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

Good luck. 祝好运。

<?php
    preg_match("/url\=([^\"]+)/i", $string, $matches);

so basically you would like to match any characters (1+) after url= but not after the " 所以基本上你想在url =之后匹配任何字符(1+),但不是在“

I'd suggest looking into simplehtmldom. 我建议调查simplehtmldom。 It is a DOM parser that uses a selector similar to jQuery and CSS. 它是一个DOM解析器,它使用类似于jQuery和CSS的选择器。

http://simplehtmldom.sourceforge.net/ http://simplehtmldom.sourceforge.net/

$html = load($html_from_database);
// Find all frames
foreach($html->find('frame') as $element){
   $source = $element->src; // extract the source from the frame.
   // This is where you do your magic like changing links. 
   $element->href = $source ; // This is where you replace the old source
}


// UPDATE $html back into the table.

Make sure you make a complete backup of all tables before you UPDATE any tables after parsing :) 在解析后更新任何表之前,请确保对所有表进行完整备份:)

http://simplehtmldom.sourceforge.net/manual.htm http://simplehtmldom.sourceforge.net/manual.htm

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM