[英]how to extract data(string) using php using regex?
i have tried to extract 我试图提取
$str = "Instant Oatmeal - Corn Flavour 175g (35g x 5)";
preg_match('/(?P<name>.*) (?P<total_weight>\d+)(?P<total_weight_unit>.*) \((?P<unitWeight>\d+)(?P<unitWeight_unit>.*) x (?P<portion_no>\d+)\)/', $str, $m);
it is correct: 它是正确的:
Instant Oatmeal - Corn Flavour 175g (35g x 5)
name : Instant Oatmeal - Corn Flavour
total_weight : 175 g
#portion : 5
unit weight : 35 g
However, if i want to extract 但是,如果我要提取
$str = "Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)";
it is incorrect: 这是不正确的:
Cholcolate Sandwich Cookies (Tray) 264.6g (29.4g x 9)
name : Cholcolate Sandwich Cookies (Tray)
total_weight : 264 .6g
#portion : 9
unit weight : 29 .4g
How to solve this? 如何解决呢?
When dealing with non-trivial regexes like this one, you can dramatically improve readability (and maintainability) by writing them in free-spacing format with lots of comments (and indentation for any nested parentheses). 在处理像这样的非常规正则表达式时,您可以通过以自由间距格式编写带有大量注释(并缩进任何嵌套括号)的格式来显着提高可读性(和可维护性)。 Here is your original regex in free spacing format with comments:
这是您的原始正则表达式,带有自由空格格式并带有注释:
$re_orig = '/# Original regex with added comments.
(?P<name>.*) # $name:
[ ] # Space separates name from weight.
(?P<total_weight>\d+) # $total_weight:
(?P<total_weight_unit>.*) # $total_weight_unit:
[ ] # Space separates totalunits from .
\( # Literal parens enclosing portions data.
(?P<unitWeight>\d+) # $unitWeight:
(?P<unitWeight_unit>.*) # $unitWeight_unit:
[ ]x[ ] # "space-X-space" separates portions data.
(?P<portion_no>\d+) # $portion_no:
\) # Literal parens enclosing portions data.
/x';
Here is an improved version: 这是一个改进的版本:
$re_improved = '/# Match Name, total weight, units and portions data.
^ # Anchor to start of string.
(?P<name>.*?) # $name:
[ ]+ # Space(s) separate name from weight.
(?P<total_weight> # $total_weight:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
(?P<total_weight_unit> # $total_weight_unit:
.+? # Units consist of any chars.
)
[ ]+ # Space(s) separate total from portions.
\( # Literal parens enclosing portions data.
(?P<unitWeight> # $unitWeight:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
(?P<unitWeight_unit> # $unitWeight_unit:
.+? # Units consist of any chars.
)
[ ]+x[ ]+ # "space-X-space" separates portions data.
(?P<portion_no> # $portion_no:
\d+ # Required integer portion.
(?:\.\d*)? # Optional fractional portion.
)
\) # Literal parens enclosing portions data.
$ # Anchor to end of string.
/xi';
Notes: 笔记:
i
ignorecase modifier in case the X
in the portions data is uppercase. i
ignorecase修饰符,以防parts数据中的X
大写。 I'm not sure how you are applying this regex, but this improved regex should solve your immediate problem. 我不确定您如何应用此正则表达式,但是经过改进的正则表达式应该可以解决您的直接问题。
Edit: 2011-10-09 11:17 MDT Changed expression for units to be more lax to allow for cases pointed out by Ilmari Karonen. 编辑:2011-10-09 11:17 MDT更改了单位的表达式,使其更加宽松,以允许Ilmari Karonen指出的情况。
Use this : 用这个 :
/(?P<name>.*) (?P<total_weight>\b[0-9]*\.?[0-9]+)(?P<total_weight_unit>.*) \((?P<unitWeight>\b[0-9]*\.?[0-9]+)(?P<unitWeight_unit>.*) x (?P<portion_no>\d+)\)/
Your problem is that you are not taking into account floating point numbers. 您的问题是您没有考虑浮点数。 I corrected this.
我纠正了这个。 Note that the portion is still an integer but I guess this is logical :)
请注意,该部分仍然是整数,但我想这是合乎逻辑的:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.