正则表达式剥离特定HTML标记内的字符串

Question

I'm trying to strip out a string, which occurs only once on a page obtained using cURL. 我正在尝试去除一个字符串，该字符串在使用cURL获得的页面上只会出现一次。 Example: 例：

<h3 class=" ">STRING IN QUESTION</h3>

or 要么

<h3 class="active">STRING IN QUESTION</h3>

or 要么

<h3 class=" active">STRING IN QUESTION</h3>

I would like to do this using preg_match, unless it can be accomplished with a less resource-intensive method. 我想使用preg_match进行此操作，除非可以用较少资源消耗的方法来完成。

Here is the regex I'm using, which is producing zero results: 这是我正在使用的正则表达式，它产生零结果：

<h3\sclass="\s">(.*?)</h3>

EDIT: 编辑：

Here is the actual code (an actual URL used here in place of dynamic one) -- discovered that when pulled via cURL, the class attribute does not exist, but still does not work as shown: 这是实际的代码（这里用实际的URL代替动态的URL）－发现通过cURL提取时，class属性不存在，但仍然无法正常工作，如图所示：

$ch = curl_init ("URL IN QUESTION"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);

preg_match('<h3>(.*?)</h3>', $page, $match);

print_r($match);

Prints Nothing 什么都不打印

Answer 1

This does the trick: 这可以解决问题：

$str='<h3 class=" ">STRING IN QUESTION</h3>';
preg_match('/<h3.*?>(.*?)<\/h3>/',$str,$match);
print_r($match);

Output: 输出：

Array
(
    [0] => <h3 class=" ">STRING IN QUESTION</h3>
    [1] => STRING IN QUESTION
)

Explanation: 说明：

<h3.*?> # Match h3 tags (non-greedy)
(.*?)   # Match everything after tag (non-greedy, captured)     
<\/h3>  # Match closing tag - Note the escaped forward slash!

However that URL contains no <h3> tags, it does contain a <h1> tag however and to match it you would need to make the regex match newlines with a trailing s 但是，该URL不包含<h3>标记，但是确实包含<h1>标记，并且要与之匹配，您需要使regex与后缀s换行s

preg_match('/<h1.*?>(.*?)<\\/h1>/s',$page,$match);

Output: 输出：

Array
(
    [0] => <h1 class="">
<span class="pageTitle ">Braman Motorcars</span>
</h1>
    [1] => 
<span class="pageTitle ">Braman Motorcars</span>

)

Answer 2

Maybe: 也许：

<h3\s+class="\s*(active)?">(.*?)</h3>

and then use the \\1 to retrieve "active" or "" and \\2 for "String in question" 然后使用\\1检索“活动”或“”，并使用\\2检索“所讨论的字符串”

I've never done any php, but maybe this would work?: 我从来没有做过任何php，但是也许这样行得通吗？：

$result = "not found"
if (preg_match('#<h3\s+class="\s*(active)?">(.*?)</h3>#', $page, $match))
{
    $result = $match;
}
print_r($result)

Answer 3

Try with: 尝试：

preg_match('#<h3\s?class="\s?(active)?">(.+)</h3>#', $yourString, $match);

Remember, in your regex you must always provide a delimiter . 请记住，在您的正则表达式中，您必须始终提供定界符。

正则表达式剥离特定HTML标记内的字符串

问题描述

3 个解决方案

解决方案1
3 已采纳 2012-11-25 20:39:34

解决方案2
1 2012-11-25 20:25:51

解决方案3
0 2012-11-25 20:37:14

正则表达式剥离特定HTML标记内的字符串

问题描述

3 个解决方案

解决方案1 3 已采纳 2012-11-25 20:39:34

解决方案2 1 2012-11-25 20:25:51

解决方案3 0 2012-11-25 20:37:14

解决方案1
3 已采纳 2012-11-25 20:39:34

解决方案2
1 2012-11-25 20:25:51

解决方案3
0 2012-11-25 20:37:14