PHP正则表达式匹配HTML之间的数据

Question

i have created a regex, that actually extracts the data what i need, but it also includes ">" character, how do i get rid of it? 我创建了一个正则表达式，实际上提取了我需要的数据，但是它还包含“>”字符，我该如何摆脱它呢？ Here's the code. 这是代码。

<?php

$content = file_get_contents('www.example.com');
$pattern = "/>([0-9]{2}\.[0-9]{3})/";
preg_match_all($pattern, $content, $matches);
echo $matches[0][2];

?>

and the HTML to extract from 和要提取的HTML

<td style="text-align:right" class="row">23.020</td>

it gives me the "<23.020" but what i need is "23.020" i know it's a n00b question, but how do i get rid of the "<" 它给了我“ <23.020”，但是我需要的是“ 23.020”，我知道这是一个n00b问题，但是我如何摆脱“ <”

Answer 1

$content = '<td style="text-align:right" class="row">23.020</td>';
$pattern = "/>([0-9]{2}\.[0-9]{3})/";
preg_match_all($pattern, $content, $matches);
var_dump($matches);

will give you 会给你

array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(7) ">23.020"
  }
  [1]=>
  array(1) {
    [0]=>
    string(6) "23.020"
  }
}

So simply use $matches[1][0] . 因此，只需使用$matches[1][0] 。

Answer 2

If you want to match something in a regex, but not capture it, then you can use an " assertion ". 如果要匹配正则表达式中的某项而不捕获它，则可以使用“ 断言 ”。 For your string it would be a (?<=[>]) lookbehind. 对于您的字符串，它后面是(?<=[>]) 。

 /(?<=>)([0-9]{2}\.[0-9]{3})/

In your case however, you already have a capture group which excludes the > anchor. 但是，对于您而言，您已经有一个捕获组，其中不包括>定位符。 You just need to access the right result group then: 您只需要访问正确的结果组，即可：

 echo $matches[1][2];

The [1] refers to the inner (...) parens group, whereas your [0] would return the complete match. [1]指的是内部(...)括号组，而您的[0]将返回完整匹配项。

PHP正则表达式匹配HTML之间的数据

问题描述

2 个解决方案

解决方案1
2 2012-01-16 16:29:35

解决方案2
1 2012-01-16 16:28:41

PHP正则表达式匹配HTML之间的数据

问题描述

2 个解决方案

解决方案1 2 2012-01-16 16:29:35

解决方案2 1 2012-01-16 16:28:41

解决方案1
2 2012-01-16 16:29:35

解决方案2
1 2012-01-16 16:28:41