简体   繁体   English

preg_match_all刮擦html标签之间找到的单词

[英]preg_match_all to scrape found word between html tags

I have the following piece of code which should match the provided string to $contents. 我有以下代码应将提供的字符串与$ contents相匹配。 $contents variable has a web page contents stored through file_get_contents() function: $ contents变量具有通过file_get_contents()函数存储的网页内容:

if (preg_match('~<p style="margin-top: 40px; " class="head">GENE:<b>(.*?)</b>~iU', $contents, $match)){
                    $found_match = $match[1];
                }

The original string on the said webpage looks like this: 所述网页上的原始字符串如下所示:

<p style="margin-top: 40px; " class="head">GENE:<b>TSPAN6</b>

I would like to match and store the string 'TSPAN6' found on the web page through (.*?) into $match[1]. 我想将通过(。*?)在网页上找到的字符串'TSPAN6'匹配并存储到$ match [1]中。 However, the matching does not seem to work. 但是,匹配似乎不起作用。 Any ideas? 有任何想法吗?

Unfortunately, your suggestion did not work. 不幸的是,您的建议没有用。

After some hours of looking through the html code I have realized that the regex simply had a blank space right after the colon. 经过几个小时的html代码查看,我意识到正则表达式在冒号后面只是有一个空格。 As such, the code snippet now looks like this: 因此,现在的代码片段如下所示:

$pattern = '#GENE: <b>(.*)</b>#i';
preg_match($pattern1, $contents, $match1);
if (isset($match1[1]))
{
    $found_flag = $match1[1];
}

Try this: 尝试这个:

preg_match( '#GENE:<b>([^<]+)</b>si#', $contents, $match );
$found_match = ( isset($match[1]) ? $match[1] : false );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM