简体   繁体   English

正则表达式剥离特定HTML标记内的字符串

[英]Regex to strip string inside specific HTML tag

I'm trying to strip out a string, which occurs only once on a page obtained using cURL. 我正在尝试去除一个字符串,该字符串在使用cURL获得的页面上只会出现一次。 Example: 例:

<h3 class=" ">STRING IN QUESTION</h3>

or 要么

<h3 class="active">STRING IN QUESTION</h3>

or 要么

<h3 class=" active">STRING IN QUESTION</h3>

I would like to do this using preg_match, unless it can be accomplished with a less resource-intensive method. 我想使用preg_match进行此操作,除非可以用较少资源消耗的方法来完成。

Here is the regex I'm using, which is producing zero results: 这是我正在使用的正则表达式,它产生零结果:

<h3\sclass="\s">(.*?)</h3>

EDIT: 编辑:

Here is the actual code (an actual URL used here in place of dynamic one) -- discovered that when pulled via cURL, the class attribute does not exist, but still does not work as shown: 这是实际的代码(这里用实际的URL代替动态的URL)-发现通过cURL提取时,class属性不存在,但仍然无法正常工作,如图所示:

$ch = curl_init ("URL IN QUESTION"); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$page = curl_exec($ch);

preg_match('<h3>(.*?)</h3>', $page, $match);

print_r($match);

Prints Nothing 什么都不打印

This does the trick: 这可以解决问题:

$str='<h3 class=" ">STRING IN QUESTION</h3>';
preg_match('/<h3.*?>(.*?)<\/h3>/',$str,$match);
print_r($match);

Output: 输出:

Array
(
    [0] => <h3 class=" ">STRING IN QUESTION</h3>
    [1] => STRING IN QUESTION
)

Explanation: 说明:

<h3.*?> # Match h3 tags (non-greedy)
(.*?)   # Match everything after tag (non-greedy, captured)     
<\/h3>  # Match closing tag - Note the escaped forward slash!

However that URL contains no <h3> tags, it does contain a <h1> tag however and to match it you would need to make the regex match newlines with a trailing s 但是,该URL不包含<h3>标记,但是确实包含<h1>标记,并且要与之匹配,您需要使regex与后缀s换行s

preg_match('/<h1.*?>(.*?)<\\/h1>/s',$page,$match);

Output: 输出:

Array
(
    [0] => <h1 class="">
<span class="pageTitle ">Braman Motorcars</span>
</h1>
    [1] => 
<span class="pageTitle ">Braman Motorcars</span>

)

Maybe: 也许:

<h3\s+class="\s*(active)?">(.*?)</h3>

and then use the \\1 to retrieve "active" or "" and \\2 for "String in question" 然后使用\\1检索“活动”或“”,并使用\\2检索“所讨论的字符串”

I've never done any php, but maybe this would work?: 我从来没有做过任何php,但是也许这样行得通吗?:

$result = "not found"
if (preg_match('#<h3\s+class="\s*(active)?">(.*?)</h3>#', $page, $match))
{
    $result = $match;
}
print_r($result)

Try with: 尝试:

preg_match('#<h3\s?class="\s?(active)?">(.+)</h3>#', $yourString, $match);

Remember, in your regex you must always provide a delimiter . 请记住,在您的正则表达式中,您必须始终提供定界符

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM