[英]Regex, get prices. But dots or commas for decimals
I want to get the prices out of text using regex. 我想使用正则表达式从文字中扣除价格。
Small example: "This great product only for €1.000,59 today!" 小示例:“今天,这款出色的产品仅售1.00,59欧元!”
I would like to get the price from the text as written above. 我想从上面的文字中获取价格。 This is my python regex so far:
到目前为止,这是我的python正则表达式:
re.findall(ur'([0-9,.]*)', text)
There is only a small problem. 只有一个小问题。 Some texts use commas (,) for splitting the decimals, others use dots (.) and some of them don't even have decimals or replaced the 00 decimals with a dash (-), like €59,-
一些文本使用逗号(,)分隔小数,另一些文本则使用点(。),其中一些甚至没有小数,或者用破折号(-)代替了00小数,例如€59,-
So the ideal situation, to get all prices without any problem would be (in my opinion): 因此,理想的情况是(我认为)获得所有价格都没有问题:
If both is NO: remove all dots and commas. 如果两者均为“否”:删除所有点和逗号。 If one of the two questions is YES: if the decimals are seperated by a dot (.), replace that dot with a comma, or if it's already a comma, just leave it like that.
如果两个问题之一是“是”:如果小数点用点号(。)分隔,则用逗号替换该点,或者如果它已经是逗号,则将其保留为原来的样子。 And remove the rest of the commas and dots.
并删除其余的逗号和点。
Is that possible with regex? 正则表达式可能吗?
Edit: 编辑:
Sorry, I did not read the problem description carefully enough. 抱歉,我没有足够仔细地阅读问题描述。 I think to solve the problem, you need two regex patterns.
我认为要解决该问题,您需要两个正则表达式模式。 First do a
re.sub()
then re.findall()
首先做一个
re.sub()
然后是re.findall()
pattern = re.compile(r'(([.,]{1})(\d{1,3}|-))')
s = "2,456,777.00 xxxxxxxxxxxxx 59,789,- xxxxxxxxxxxx 59,- xxxxxxxxxx 1.000,59"
def subs(m):
g0 = m.group(0)
g3 = m.group(3)
if g3 == '-':
g3 = '00'
if len(g0) == 4:
return ',' + g0[1:4]
else:
return '.' + g3
c = re.findall(r'[\d.,-]+', re.sub(pattern, subs, s))
print c
>> ['2,456,777.00', '59,789.00', '59.00', '1,000.59']
A little cumbersome indeed. 确实有点麻烦。 Hope someone can come up with a smarter one.
希望有人能提出一个更聪明的建议。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.