简体   繁体   English

preg_match php解析html

[英]preg_match php parse html

I have a problem to parse a html file using the php preg_match() function. 我在使用php preg_match()函数解析html文件时遇到问题。

Here is a sample line of the html file: 这是html文件的示例行:

<DIV STYLE="top:214px; left:506px; width:88px" Class="S7">15:03</DIV>

The div styles show a table that I want to read out. div样式显示我要读出的表。 It is like a coordinate system. 它就像一个坐标系。 To get the content I need what is inside the div tag. 为了获得内容,我需要div标签内的内容。 But to know to what content the div belongs I need the top, left and width values. 但是要知道div属于什么内容,我需要top,left和width值。 The div class is not always S7. div类并不总是S7。 And the content (here 15:03) can be either numbers 1234 (4 digits) times (00:00) oder letters (AAA). 并且内容(此处为15:03)可以是数字1234(4位数字)乘以(00:00)字母(AAA)。

I am new to regular expressions so my try might look very stupid to those who are familiar to them. 我对正则表达式并不陌生,因此对于那些熟悉它们的人来说,我的尝试可能看起来非常愚蠢。

Here is what I tried. 这是我尝试过的。 But did not get any result: 但是没有得到任何结果:

    $reg_ex = "/\<DIV STYLE\=\"top:([0-9])px; left:([0-9])px; width:([0-9])px\" Class\=\"S7\"\>(.*?)\<\/DIV\>/";
    $ret = preg_match($reg_ex,fgets($file),$outp);

Would be great if someone could help me. 如果有人可以帮助我,那会很好。

Thanks a lot in advance! 在此先多谢!

Try 尝试

$reg_ex = "/<DIV STYLE=\"top:([0-9]+)px; left:([0-9]+)px; width:([0-9]+)px\" Class=\"S7\"\>(.*?)<\/DIV>/";
$ret = preg_match($reg_ex,fgets($file),$outp);

[0-9] means exactly one char out of 0-9. [0-9]表示[0-9]一个字符。 However, you want to match one or more chars our ouf [0-9] , thus, you have to use [0-9]+ 但是,您要匹配我们的输出[0-9]中的一个或多个字符,因此,您必须使用[0-9]+

Also, you only need to mask special regexp-chars (ie, "/") and not "<", "=", and ">" by prepending with a backslash. 另外,您只需要在前面加上反斜杠就可以屏蔽特殊的正则表达式字符(即“ /”),而不必屏蔽“ <”,“ =“和”>”。 masking " is needed because you need to have double quotes within double quotes. 由于您需要在双引号中包含双引号,因此需要使用" masking "

In my example the class is still hardcoded to S7 . 在我的示例中,该类仍被硬编码为S7 If you also need to parse it use (.+) instead (which matches any char and puts it into a group). 如果您还需要解析它,请改用(.+) (与任何字符匹配并将其放入一个组)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM