如何用另一个唯一的URL替换字符串中的每个URL？

Question

我有以下内容：

$reg[0] = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$reg[1] = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replace[0] = '<a$1href="http://www.yahoo.com"$3>';
$replace[1] = '<a$1href="http://www.live.com"$3>';
$string = 'Test <a href="http://www.google.com">Google!!</a>Test <a href="http://www.google.com">Google!!2</a>Test';
echo preg_replace($reg, $replace, $string);

结果是：

Test <a href="http://www.live.com">Google!!</a>Test <a href="http://www.live.com">Google!!2</a>Test

我期待最后的结果（区别在于第一个链接）：

Test <a href="http://www.yahoo.com">Google!!</a>Test <a href="http://www.live.com">Google!!2</a>Test

想法是用唯一的其他URL替换字符串中链接中的每个URL。 它是用于新闻通讯系统的，我想跟踪人们单击了什么，因此该URL将是“伪” URL，在记录了单击之后，他们将被重定向到真实URL。

Answer 1

问题在于您的第一个替换字符串将与第二个搜索模式匹配，从而用第二个替换字符串有效地覆盖了第一个替换字符串。

除非您能以某种方式区分“修改的”链接与原始链接，以使它们不会被其他表达式捕获（也许通过添加额外的HTML属性？），否则我认为您真的不能通过单个preg_replace()来解决此问题。 preg_replace()致电。 除了正则表达式的差异外，想到的一种可能的解决方案是使用preg_match_all() ，因为它将为您提供一系列匹配项。 然后，您可以遍历数组并在每个匹配的URL上运行str_replace() ，从而用跟踪URL对匹配的URL进行编码。

Answer 2

我对正则表达式不好，但是如果您正在做的只是用跟踪点击次数并重定向用户的内部URL替换外部URL（即，不是网站/应用程序的一部分），那么它应该很容易构造仅与外部网址匹配的正则表达式。

因此，假设您的域是foo.com ，那么您只需要创建一个仅与不包含以http://foo.com开头的URL的超链接匹配的正则表达式即可。 现在，就像我说的那样，我对正则表达式非常不好，但这是我的最佳选择：

$reg[0] = '`<a(\s[^>]*)href="(?!http://foo.com)([^"]*)"([^>]*)>`si';

编辑：如果您也想跟踪点击链接到内部URL，则只需将http://foo.com替换为重定向/跟踪页面的URL，例如http://foo.com/out.php 。

我将通过一个示例场景来说明我在说什么。 假设您收到以下新闻通讯：

<h1>Newsletter Name</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec lobortis,
ligula <a href="http://bar.com">sed sollicitudin</a> dignissim, lacus dolor
suscipit sapien, <a href="http://foo.com">eget auctor</a> ipsum ligula
non tortor. Quisque sagittis sodales elit. Mauris dictum blandit lacus.
Mauris consequat <a href="http://last.fm">laoreet lacus</a>.</p>

出于本练习的目的，搜索模式将是：

// Only match links that don't begin with: http://foo.com/out.php
`<a(\s[^>]*)href="(?!http://foo.com/out\.php)([^"]*)"([^>]*)>`si

此正则表达式可分为3部分：

<a(\\s[^>]*)href="
(?!http://foo.com/out\\.php)([^"]*)
"([^>]*)>

在搜索的第一阶段，脚本将检查：

<a href="http://bar.com">

此链接满足regexp的所有3个组成部分，因此URL存储在数据库中，并替换为http://foo.com/out.php?id=1 。

在搜索的第二遍，脚本将检查：

<a href="http://foo.com/out.php?id=1">

此链接匹配1和3，但不匹配2。因此搜索将继续到下一个链接：

<a href="http://foo.com">

此链接满足regexp的所有3个组件，因此该URL存储在数据库中，并替换为http://foo.com/out.php?id=2 。

在搜索的第3遍，脚本将检查前2个（已替换的）链接，跳过它们，然后在新闻稿中找到与最后一个链接的匹配项。

Answer 3

我不知道，如果我理解正确的话。 但是我写了以下代码段：正则表达式匹配一些超链接。 然后，它循环遍历结果，并将文本节点与超链接引用进行比较。 在超链接引用中找到文本节点后，它将通过插入带有唯一键的引用示例链接来扩展匹配。

更新代码段查找所有超链接：

查找链接
建立回溯链接
找到每个找到的链接的位置（matches [3]）并设置模板标签
用引用链接替换模板标签每个链接位置都是唯一的。

$ string ='<h1>时事通讯名称</ h1> <p> Lorem ipsum dolor sit amet，可以保护您的上流。 Donec lobortis，ligula <a href="http://bar.com"> sed sollicitudin </a> dignissim，lacus dolor suscipit sapien，<a href="http://foo.com"> bar.com </ a> ipsum ligula非tortor。 Quisque sagittis sodales淡雅。 毛里斯·迪克姆·布朗蒂特·拉库斯。 毛里斯（Mauris）认为<a href="http://last.fm"> laoreet lacus </a>。 Donec lobortis，ligula <a href="http://bar.com"> sed sollicitudin </a> dignissim，lacus dolor suscipit sapien，<a href="http://foo.com"> bar.com </ a> ipsum ligula非tortor。 Quisque sagittis sodales淡雅。 毛里斯·迪克姆·布朗蒂特·拉库斯。 毛里斯（Mauris）认为<a href="http://last.fm"> laoreet lacus </a>。 Donec lobortis，ligula <a href="http://bar.com"> sed sollicitudin </a> dignissim，lacus dolor suscipit sapien，<a href="http://foo.com"> bar.com </ a> ipsum ligula非tortor。 Quisque sagittis sodales淡雅。 毛里斯·迪克姆·布朗蒂特·拉库斯。 毛里斯（Mauris consequat）<a href="http://last.fm"> laoreet lacus </a>。</ p>';

$regex = '<[^>]+>(.*)<\/[^>]+>';
preg_match_all("'<a\s+href=\"(.*)\"\s*>(.*)<\/[^>]+>'U",$string,$matches);


$uniqueURL = 'http://www.yourdomain.com/trackback.php?id=';

foreach($matches[2] as $k2 => $m2){
    foreach($matches[1] as $k1 => $m1){
        if(stristr($m1, $m2)){
                $uniq = $uniqueURL.md5($matches[0][$k2])."_".rand(1000,9999);
                $matches[3][$k1] = $uniq."&refLink=".$m1;
        }
    }
}


foreach($matches[3] as $key => $val) {

    $startAt = strpos($string, $matches[1][$key]);
    $endAt= $startAt + strlen($matches[1][$key]);

    $strBefore = substr($string,0, $startAt);
    $strAfter = substr($string,$endAt);

    $string = $strBefore . "@@@$key@@@" .$strAfter;

}
foreach($matches[3] as $key => $val) {
        $string = str_replace("@@@$key@@@",$matches[3][$key] ,$string);
}
print "<pre>";
echo $string;

Answer 4

在PHP 5.3之前，您只能在现场创建函数，而必须使用create_function（我讨厌）或帮助器类。

/**
 * For retrieving a new string from a list.
 */
class StringRotation {
    var $i = -1;
    var $strings = array();

    function addString($string) {
        $this->strings[] = $string;
    }

    /**
     * Use sprintf to produce result string
     * Rotates forward
     * @param array $params the string params to insert
     * @return string
     * @uses StringRotation::getNext()
     */
    function parseString($params) {
        $string = $this->getNext();
        array_unshift($params, $string);
        return call_user_func_array('sprintf', $params);
    }

    function getNext() {
        $this->i++;
        $t = count($this->strings);
        if ($this->i > $t) {
            $this->i = 0;
        }
        return $this->strings[$this->i];
    }

    function resetPointer() {
        $this->i = -1;
    }
}

$reg = '`<a(\s[^>]*)href="([^"]*)"([^>]*)>`si';
$replaceLinks[0] = '<a%2$shref="http://www.yahoo.com"%4$s>';
$replaceLinks[1] = '<a%2$shref="http://www.live.com"%4$s>';

$string = 'Test <a href="http://www.google.com">Google!!</a>Test <a href="http://www.google.com">Google!!2</a>Test';

$linkReplace = new StringRotation();
foreach ($replaceLinks as $replaceLink) {
    $linkReplace->addString($replaceLink);
}

echo preg_replace_callback($reg, array($linkReplace, 'parseString'), $string);

如何用另一个唯一的URL替换字符串中的每个URL？

问题描述

4 个解决方案

解决方案1
2 2009-04-18 07:34:21

解决方案2
1 2009-04-18 08:46:27

解决方案3
1 2009-04-19 09:43:35

解决方案4
0 2009-04-19 20:56:26

如何用另一个唯一的URL替换字符串中的每个URL？

问题描述

4 个解决方案

解决方案1 2 2009-04-18 07:34:21

解决方案2 1 2009-04-18 08:46:27

解决方案3 1 2009-04-19 09:43:35

解决方案4 0 2009-04-19 20:56:26

解决方案1
2 2009-04-18 07:34:21

解决方案2
1 2009-04-18 08:46:27

解决方案3
1 2009-04-19 09:43:35

解决方案4
0 2009-04-19 20:56:26