[英]preg_match_all not working with html tags
I am trying to receive the content of the <tbody>
tag from this page . 我想收到的内容<tbody>
标签从这个页面 。
There are only one table with only one tag <tbody>
, and i want to get all rows from this table 只有一个表只有一个标签<tbody>
我想从该表中获取所有的行
I try to do this by this way 我尝试通过这种方式
$page = file_get_contents('http://pk.zntu.edu.ua/fakultety-ta-napryamy-pidhotovky/derzhavne-zamovlennya-2011-bakalavr');
preg_match_all("/<tbody>(.+?)<\/tbody>/is", $page, $output_array);
var_dump($output_array);
And i receive empty arrays: 而且我收到空数组:
array(2) { [0]=> array(0) { } [1]=> array(0) { } }
I have tried different variants of patterns like 我尝试了不同的模式变体,例如
/<tbody>(.*?)<\\/tbody>/is
" “ /<tbody>(.*?)<\\/tbody>/is
” /<tbody>.+?<\\/tbody>/is
" “ /<tbody>.+?<\\/tbody>/is
” /<tbody>.*?<\\/tbody>/is
" “ /<tbody>.*?<\\/tbody>/is
” /<tbody>.+<\\/tbody>/is
" “ /<tbody>.+<\\/tbody>/is
” /<tbody>.*<\\/tbody>/is
" “ /<tbody>.*<\\/tbody>/is
” But no one works 但是没人能用
With PCRE and Regex Library all should be okay 使用PCRE和Regex Library都可以
I don't know what's the problem, please help 我不知道怎么了,请帮忙
Your pattern it's very simple, the regex
above should be fine. 您的模式非常简单,上面的regex
应该可以。 but I think the problem is come from file_get_contents
. 但我认为问题出在file_get_contents
。 I just try to count number of lines in $page
variable and i get this 我只是尝试计算$page
变量中的行数,我得到了
71220
But the real code that I check by clicking into that website and copy source code then count it manually, it's about 1787
lines. 但是,通过单击该网站并复制源代码然后检查的真实代码,然后对其进行手动计数,大约需要1787
行。
What does this mean? 这是什么意思?
It maybe means that the code that you store it in $page
variable is not the same as HTML code that you see when you manually click into that website. 这可能意味着您将其存储在$page
变量中的代码与您手动单击该网站时看到的HTML代码不同。 In actually when you open one website, many thing can be occurred eg listener method is working, but in case that you download those source code directly to PHP variable some methods maybe never executed and this can make you get an incomplete HTML code. 实际上,当您打开一个网站时,可能会发生很多事情,例如侦听器方法正在工作,但是如果您直接将这些源代码下载到PHP变量中,则某些方法可能永远不会执行,这会使您获得不完整的HTML代码。
Note that the another evidence that support my assumption is I can not even find a keyword tbody
in your $page
variable. 请注意,支持我的假设的另一个证据是,我什至在$page
变量中找不到关键字tbody
。
tbody
tag may also contain attributes. tbody
标签也可以包含属性。 So you need to match that attributes also in-order to get the content of tbody
tag. 所以,你需要匹配也,以获得的内容属性tbody
标签。
'/<tbody\b[^>]*>(.*?)<\/tbody>/is'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.