正则表达式匹配不同html标签内的重量（lbs）

Question

I have an HTML file that contains products information including their weights. 我有一个HTML文件，其中包含产品信息，包括其权重。 I am trying to get the weights(any numbers that precede lbs). 我试图获得权重（任何数字在lbs之前）。 occasionally there are space between lbs and the weight number. 偶尔在磅和重量之间有空格。 I came up with the regex: preg_match(">[0-9]+(\\.[0-9][0-9]?)(.*?)lbs/i",fgets($file),$matches); 我想出了正则表达式： preg_match(">[0-9]+(\\.[0-9][0-9]?)(.*?)lbs/i",fgets($file),$matches); but this is returning everything between the first '>' and 'lbs', it is not practical since there are a lot of tags involved. 但是这会在第一个'>'和'lbs'之间返回所有内容，因为涉及很多标签，所以它不实用。 so what I am trying to accomplish is to get only the number between the character '>' that directly precedes the weight and the characters 'lbs' that follows the weight ignoring the space between. 所以我想要完成的只是获得直接在权重之前的字符'>'和跟随权重的字符'lbs'之间的数字，忽略之间的空格。

so in the example below, I want to get 0.94,0.12,0.94. 所以在下面的例子中，我想得到0.94,0.12,0.94。 Any help is appreciated. 任何帮助表示赞赏。

<td width="513" valign="top">0.94 lbs
<td width="513" valign="top">0.12lbs
<td width="513" valign="top">0.94LBS
<td width="513" valign="top">penguin lover

Noticee that the tags ' <td width="513" valign="top"> ' precedes other characters besides the weight. 请注意，标签' <td width="513" valign="top"> '除了权重之外还在其他字符之前。

Any thoughts, help will be appreciated. 任何想法，帮助将不胜感激。

Answer 1

Use: 采用：

/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i

This uses a lookahead and a lookbehind such that the only thing matched is the decimal number. 这使用了前瞻和后瞻，这样唯一匹配的就是十进制数。

Explanation: 说明：

(?<=>) Lookbehind to check for a > -- (?<=xxx) means look behind for xxx (?<=>) Lookbehind检查> - (?<=xxx)意味着查看xxx背后

[0-9]+(?:\\.[0-9][0-9]?) Your unchanged decimal regex using a non-capturing group (?:xxx) [0-9]+(?:\\.[0-9][0-9]?)使用非捕获组的未更改的十进制正则表达式(?:xxx)

(?=\\s*lbs) Lookahead for 0-many whitespace characters followed by lbs (?=\\s*lbs)预测0-many空格字符后跟lbs

Note that you can replace each [0-9] with \\d if you want, they are equivalent. 请注意，如果需要，可以用\\d替换每个[0-9] ，它们是等效的。

Example code: 示例代码：

$str = '<td width="513" valign="top">0.94 lbs
        <td width="513" valign="top">0.12lbs
        <td width="513" valign="top">0.94LBS
        <td width="513" valign="top">penguin lover';

preg_match_all("/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i",$str,$matches);

print_r($matches[0]);

Output: 输出：

 Array ( [0] => 0.94 [1] => 0.12 [2] => 0.94 )

Answer 2

preg_match_all('/[0-9]+(?:\.[0-9]+)(?=\s*lbs)/i', $html, $matches);
print_r($matches[0]);

Regular expression: 正则表达式：

[0-9]+         any character of: '0' to '9' (1 or more times)
(?:            group, but do not capture (optional)
  \.           '.' 
  [0-9]+       any character of: '0' to '9' (1 or more times)
)              end of grouping
 (?=           look ahead to see if there is:
  \s*          whitespace (\n, \r, \t, \f, and " ") (0 or more times)
  lbs          'lbs'
)              end of look-ahead

See working demo 看working demo

正则表达式匹配不同html标签内的重量（lbs）

问题描述

2 个解决方案

解决方案1
2 2013-11-22 17:27:20

解决方案2
1 2013-11-22 17:39:59

正则表达式匹配不同html标签内的重量（lbs）

问题描述

2 个解决方案

解决方案1 2 2013-11-22 17:27:20

解决方案2 1 2013-11-22 17:39:59

解决方案1
2 2013-11-22 17:27:20

解决方案2
1 2013-11-22 17:39:59