[英]How to properly regex match the following string in python?
I have the following string: 我有以下字符串:
1- Baby Carrots (4Kids) (3 DOLLARS) [EXTRA 0 COUNT]; 1-幼胡萝卜(4个孩子)(3美元)[额外0计数]; [REQUIRED 5 COUNT]
[需要5个计数]
I am trying to get the following groups: 我正在尝试以下小组:
Item - 1
Food - Baby Carrots (4Kids) (3 DOLLARS)
Cost - 3
Extra - 0
required - 5
The following is my current match string that is not picking up anything: 以下是我当前的匹配字符串,未接收任何内容:
'(?P<item>.+?)\-(?P<food>.*)\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]'
What is wrong with my attempt? 我的尝试有什么问题?
Your original regex: 您原来的正则表达式:
(?P<item>.+?)\-(?P<food>.*)\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]
Your problems are mostly due to the fact that you are searching for any character, instead of specific ones (digits and static strings). 您的问题主要是由于您搜索的是任何字符,而不是特定的字符(数字和静态字符串)。 For example: Why do you use
例如:为什么使用
(?P<item>.+?)
if it's only going to be numbers? 如果只是数字? Change it to
更改为
(?P<item>[0-9]+?)
and the '+?':reluctant operator is not necessary in this case, since you always want the entire number. 在这种情况下,不需要'+?':不需要运算符 ,因为您总是需要完整的数字。 That is, the next portion of the match will not be in the middle of that number.
也就是说,比赛的下一部分将不在该数字的中间 。
In addition, this should be anchored to line (input) start : 另外,这应该锚定到行(输入)start :
^(?P<item>[0-9]+?)
You don't need to escape the dash (although it doesn't hurt). 您无需逃脱破折号(尽管它不会造成伤害)。
^(?P<item>[0-9]+?)-
Your food group (heh) is the most complicated part 您的食物组(嘿)是最复杂的部分
(?P<food>.*)
It doesn't just contain any character. 它不仅包含任何字符。 Based on your demo input, it only has letters, spaces, numbers, and parens.
根据您的演示输入,它只有字母,空格,数字和括号。 So search just for them:
因此,只搜索它们:
(?P<food>[\w0-9 ()]+)
Here's what we have so far: 到目前为止,这里是:
^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+)
You'll see that this also matches the cost part (which is completely missing from your regex...I assume that's just an oversight). 您会看到这也与成本部分相匹配(正则表达式中完全缺少这部分...我想这只是一个疏忽)。
So add the cost, which is 所以加上成本,这是
(
[space]DOLLARS)
But only capture the number: 但是只捕获数字:
^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+) \((?P<cost>[0-9]+) DOLLARS\)
The rest of your regex works fine, actually, and it can be added to the end as is: 实际上,您的正则表达式的其余部分都可以正常工作,并且可以按原样添加到末尾:
\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]
I'd recommend, however, changing the .*?
但是,我建议更改
.*?
to EXTRA[space]
if indeed that text is always found there (and again, no need for reluctance in this case). 如果确实总是在此处找到该文本,则返回
EXTRA[space]
(同样,在这种情况下,无需勉强)。 Same with [space]COUNT
, ;
与
[space]COUNT
相同;
and REQUIRED[space]
. 和
REQUIRED[space]
。 The more you narrow things down, the easier your regex will be to debug--assuming your input is indeed that restricted. 缩小范围越多,则正则表达式将越容易调试-假设您的输入确实受到限制。
Here's the final version (with an end-of-line anchor as well): 这是最终版本(还带有行尾锚):
^(?P<item>[0-9]+?)- (?P<food>[\w0-9 ()]+) \((?P<cost>[0-9]+) DOLLARS\) \[EXTRA (?P<extra>\d+(\.\d+)?) COUNT\]; \[REQUIRED (?P<required>\d+(\.\d+)?) COUNT\]$
Before analyzing your regex, this is what I came up with: 在分析您的正则表达式之前,这是我想到的:
(?P<item>[0-9]+)- (?P<food>[\w ()]+) \((?P<cost>[0-9]+) DOLLARS\) \[EXTRA (?P<extra>[0-9]+) COUNT\]; \[REQUIRED (?P<required>[0-9]+) COUNT\]
All these links came from the Stack Overflow Regular Expressions FAQ . 所有这些链接来自“ 堆栈溢出正则表达式常见问题解答” 。
like this : 像这样 :
(?P<item>.+?)\-\s(?P<food>.*?\)).*?\((?P<cost>\d)\s\w+\)\s\[.*?(?P<extra>\d+(\.\d+)?).*\].*\[.*?(?P<required>\d+(\.\d+)?).*\]
demo here : http://regex101.com/r/qD1rL9 演示在这里: http : //regex101.com/r/qD1rL9
As mentioned above, you are missing a capture for cost, you also need to make the food
capture non-greedy and include the closing paren. 如上所述,您缺少成本捕获功能,还需要使
food
捕获功能不贪心,并包括结束日期。 My version: 我的版本:
(?P<Item>\d)-\s*(?P<Food>.*?\))\s*\((?P<Cost>\d*).*EXTRA\s*(?P<Extra>\d*).*REQUIRED\s*(?P<Required>\d*)
{'Food': 'Baby Carrots (4Kids)', 'Item': '1', 'Required': '5', 'Extra': '0', 'Cost': '3'}
Seems a bit faster using http://www.pythonregex.com/ 使用http://www.pythonregex.com/似乎更快
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.