[英]Regular expression to match weight (lbs) inside different html tags
I have an HTML file that contains products information including their weights. 我有一个HTML文件,其中包含产品信息,包括其权重。 I am trying to get the weights(any numbers that precede lbs).
我试图获得权重(任何数字在lbs之前)。 occasionally there are space between lbs and the weight number.
偶尔在磅和重量之间有空格。 I came up with the regex:
preg_match(">[0-9]+(\\.[0-9][0-9]?)(.*?)lbs/i",fgets($file),$matches);
我想出了正则表达式:
preg_match(">[0-9]+(\\.[0-9][0-9]?)(.*?)lbs/i",fgets($file),$matches);
but this is returning everything between the first '>' and 'lbs', it is not practical since there are a lot of tags involved. 但是这会在第一个'>'和'lbs'之间返回所有内容,因为涉及很多标签,所以它不实用。 so what I am trying to accomplish is to get only the number between the character '>' that directly precedes the weight and the characters 'lbs' that follows the weight ignoring the space between.
所以我想要完成的只是获得直接在权重之前的字符'>'和跟随权重的字符'lbs'之间的数字,忽略之间的空格。
so in the example below, I want to get 0.94,0.12,0.94. 所以在下面的例子中,我想得到0.94,0.12,0.94。 Any help is appreciated.
任何帮助表示赞赏。
<td width="513" valign="top">0.94 lbs
<td width="513" valign="top">0.12lbs
<td width="513" valign="top">0.94LBS
<td width="513" valign="top">penguin lover
Noticee that the tags ' <td width="513" valign="top">
' precedes other characters besides the weight. 请注意,标签'
<td width="513" valign="top">
'除了权重之外还在其他字符之前。
Any thoughts, help will be appreciated. 任何想法,帮助将不胜感激。
Use: 采用:
/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i
This uses a lookahead and a lookbehind such that the only thing matched is the decimal number. 这使用了前瞻和后瞻,这样唯一匹配的就是十进制数。
Explanation: 说明:
(?<=>)
Lookbehind to check for a >
-- (?<=xxx)
means look behind for xxx
(?<=>)
Lookbehind检查>
- (?<=xxx)
意味着查看xxx
背后
[0-9]+(?:\\.[0-9][0-9]?)
Your unchanged decimal regex using a non-capturing group (?:xxx)
[0-9]+(?:\\.[0-9][0-9]?)
使用非捕获组的未更改的十进制正则表达式(?:xxx)
(?=\\s*lbs)
Lookahead for 0-many whitespace characters followed by lbs
(?=\\s*lbs)
预测0-many空格字符后跟lbs
Note that you can replace each [0-9]
with \\d
if you want, they are equivalent. 请注意,如果需要,可以用
\\d
替换每个[0-9]
,它们是等效的。
Example code: 示例代码:
$str = '<td width="513" valign="top">0.94 lbs
<td width="513" valign="top">0.12lbs
<td width="513" valign="top">0.94LBS
<td width="513" valign="top">penguin lover';
preg_match_all("/(?<=>)[0-9]+(?:\.[0-9][0-9]?)(?=\s*lbs)/i",$str,$matches);
print_r($matches[0]);
Output: 输出:
Array ( [0] => 0.94 [1] => 0.12 [2] => 0.94 )
preg_match_all('/[0-9]+(?:\.[0-9]+)(?=\s*lbs)/i', $html, $matches);
print_r($matches[0]);
Regular expression: 正则表达式:
[0-9]+ any character of: '0' to '9' (1 or more times)
(?: group, but do not capture (optional)
\. '.'
[0-9]+ any character of: '0' to '9' (1 or more times)
) end of grouping
(?= look ahead to see if there is:
\s* whitespace (\n, \r, \t, \f, and " ") (0 or more times)
lbs 'lbs'
) end of look-ahead
See working demo
看
working demo
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.