PHP函数替换RSS提要中的链接的时间长于x char

Question

I have an RSS feed which I am parsing and in some pages of the feed there are very long URLs that break my page. 我有一个正在解析的RSS feed，并且该feed的某些页面中有很长的URL破坏了我的页面。 For example, some are like this when you see them on the page. 例如，当您在页面上看到它们时，它们就是这样。

http://example.com/coolthings/893748662/photos/37774656-ID/another_dam_dir/MORESTUFF.php?id=7837839946HS67355 http://example.com/coolthings/893748662/photos/37774656-ID/another_dam_dir/MORESTUFF.php?id=7837839946HS67355

So because they used the actual URL for some of the links in the page instead of linking to some text about it I am left with crazy long URL-based links. 因此，因为他们使用页面中某些链接的实际URL而不是链接到有关该文本的文本，所以我留下了疯狂的基于URL的长链接。 I want a way to build some code into the parser that will automatically detect these crazy long links anywhere in the page content and maybe shorten them to something like http://example.com/coolthings/ . 我想要一种在解析器中构建一些代码的方法，该代码将自动检测页面内容中任意位置的这些疯狂的长链接，并可能将它们缩短为类似http://example.com/coolthings/ 。

I would like some type of PHP function I can use that will scan through the page and shorten the crazy long URLs if they are found. 我想使用某种类型的PHP函数，它将扫描页面并缩短疯狂的长URL（如果找到）。 I have looked all over and even tried to make something that can do this and have failed at the attempts. 我四处张望，甚至试图做出可以做到这一点的尝试，但都失败了。

I can use preg_match_all to find the URLs but can't seem to find a simple way of replacing them within the page. 我可以使用preg_match_all查找URL，但似乎找不到在页面内替换它们的简单方法。

The page is parsed via curl and SimpleXML and a few regexs. 该页面通过curl和SimpleXML和一些正则表达式进行解析。 Ideas how I can do this greatly appreciated. 我如何做到这一点的想法深表感谢。

//Not Working attempt ... This will find the crazy long urls but then they must be replaced in the page with the short version and i cant seem to make that part of my idea work. //不起作用尝试...这会发现疯狂的长网址，但随后必须在页面中将其替换为短版本，而我似乎无法使我的想法中的这一部分起作用。 I also wrote a func cutto_mid to trim at middle. 我还写了一个func cutto_mid在中间修剪。 I will use that when i can get this link replace code to work. 当我可以获取此链接时，将使用替换代码来工作。

function short_link($text){
$regex = "a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>";
preg_match_all ("/$regex/i", $text, &$matches);
$matches = $matches[1];
foreach($matches as $vars){     
if (strlen($vars) > '95'){
//return "<BR>".cutto_mid($vars, 20, 20, 50); //testing ...
return preg_replace("#$regex#i",'<a href="'.$vars .'">Replace W Short Link</a>', $text);
}
}
}

Answer 1

You should not be using Regular Expressions to parse HTML, especially not when you're already using an XML parser. 您不应该使用正则表达式来解析HTML，尤其是当您已经在使用XML解析器时。 Rather use XPath to fetch all anchors with an href, then test the length of their value. 而是使用XPath来获取带有href的所有锚，然后测试其值的长度。 Something like this: 像这样：

$links = $xml->xpath ("//a[@href]");
foreach ($links as &$l) {
    if (strlen ($l[0]) < CUTOFF)
        continue;

    $l[0] = substr ($l[0], 0, CUTOFF-3).'...';
}

PHP函数替换RSS提要中的链接的时间长于x char

问题描述

1 个解决方案

解决方案1
0 2015-07-13 15:19:12

PHP函数替换RSS提要中的链接的时间长于x char

问题描述

1 个解决方案

解决方案1 0 2015-07-13 15:19:12

解决方案1
0 2015-07-13 15:19:12