忽略preg_match上的html标签

Question

Im scrapping a site with following html 我用以下HTML报废网站

<a class="name" href="/link" data-hovercard-id="charshere"><span class="highlighted">War</span> World</a> 

<a class="name" href="/link" data-hovercard-id="charshere"> World of <span class="highlighted">fun</span></a> 

<a class="name" href="/link" data-hovercard-id="charshere">Save the<br>world</a> 

<a class="name" href="/link" data-hovercard-id="charshere">world of warcraft</a>

using this code i get the value of links 使用此代码，我得到链接的价值

preg_match_all('/<a class="name" href=".*?" data-hovercard-id=".*?">(.*)<\/a>/i', $file_string, $titles);

but the outcome is 但结果是

<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft

How do i ignore the html tags inside of it? 我如何忽略其中的html标签？ so that it would look like this 这样看起来像这样

 War World
 World of fun
 Save the world
 world of warcraft

A DomDocument could be better. 一个DomDocument可能更好。 Thanks. 谢谢。 been trying to use domDocument but I not familiar how to use its xquery. 一直在尝试使用domDocument，但我不熟悉如何使用其xquery。

Answer 1

Use strip_tags() . 使用strip_tags() 。 Here comes an example: 这里有一个例子：

$html = <<<EOF
<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft
EOF;

echo strip_tags($html);

Output: 输出：

War World
 World of fun
Save theworld
world of warcraft

Answer 2

Just remove the tags after you get the text: 收到文字后，只需删除标签即可：

<?php
$string = '<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft';
$convert = preg_replace('/<.*?>/','', $string);
print $convert;

Prints: 印刷品：

War World
 World of fun
Save theworld
world of warcraft

Answer 3

You can remove the HTML tags after you've matched your string for the links. 在为链接匹配字符串后，可以删除HTML标签。 For example 例如

$str = preg_replace('/<[^<]+>/', '', $html);

忽略preg_match上的html标签

问题描述

3 个解决方案

解决方案1
3 已采纳 2013-09-02 12:30:50

解决方案2
0 2013-09-02 12:30:21

解决方案3
0 2013-09-02 12:31:30

忽略preg_match上的html标签

问题描述

3 个解决方案

解决方案1 3 已采纳 2013-09-02 12:30:50

解决方案2 0 2013-09-02 12:30:21

解决方案3 0 2013-09-02 12:31:30

解决方案1
3 已采纳 2013-09-02 12:30:50

解决方案2
0 2013-09-02 12:30:21

解决方案3
0 2013-09-02 12:31:30