简体   繁体   English

PHP正则表达式匹配HTML之间的数据

[英]PHP regex match data between html

i have created a regex, that actually extracts the data what i need, but it also includes ">" character, how do i get rid of it? 我创建了一个正则表达式,实际上提取了我需要的数据,但是它还包含“>”字符,我该如何摆脱它呢? Here's the code. 这是代码。

<?php

$content = file_get_contents('www.example.com');
$pattern = "/>([0-9]{2}\.[0-9]{3})/";
preg_match_all($pattern, $content, $matches);
echo $matches[0][2];

?>

and the HTML to extract from 和要提取的HTML

<td style="text-align:right" class="row">23.020</td>

it gives me the "<23.020" but what i need is "23.020" i know it's a n00b question, but how do i get rid of the "<" 它给了我“ <23.020”,但是我需要的是“ 23.020”,我知道这是一个n00b问题,但是我如何摆脱“ <”

$content = '<td style="text-align:right" class="row">23.020</td>';
$pattern = "/>([0-9]{2}\.[0-9]{3})/";
preg_match_all($pattern, $content, $matches);
var_dump($matches);

will give you 会给你

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(7) ">23.020"
  }
  [1]=>
  array(1) {
    [0]=>
    string(6) "23.020"
  }
}

So simply use $matches[1][0] . 因此,只需使用$matches[1][0]

If you want to match something in a regex, but not capture it, then you can use an " assertion ". 如果要匹配正则表达式中的某项而不捕获它,则可以使用“ 断言 ”。 For your string it would be a (?<=[>]) lookbehind. 对于您的字符串,它后面是(?<=[>])

 /(?<=>)([0-9]{2}\.[0-9]{3})/

In your case however, you already have a capture group which excludes the > anchor. 但是,对于您而言,您已经有一个捕获组,其中不包括>定位符。 You just need to access the right result group then: 您只需要访问正确的结果组,即可:

 echo $matches[1][2];

The [1] refers to the inner (...) parens group, whereas your [0] would return the complete match. [1]指的是内部(...)括号组,而您的[0]将返回完整匹配项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM