简体   繁体   English

正则表达式,获取价格。 但是小数点或逗号

[英]Regex, get prices. But dots or commas for decimals

I want to get the prices out of text using regex. 我想使用正则表达式从文字中扣除价格。

Small example: "This great product only for €1.000,59 today!" 小示例:“今天,这款出色的产品仅售1.00,59欧元!”

I would like to get the price from the text as written above. 我想从上面的文字中获取价格。 This is my python regex so far: 到目前为止,这是我的python正则表达式:

re.findall(ur'([0-9,.]*)', text)

There is only a small problem. 只有一个小问题。 Some texts use commas (,) for splitting the decimals, others use dots (.) and some of them don't even have decimals or replaced the 00 decimals with a dash (-), like €59,- 一些文本使用逗号(,)分隔小数,另一些文本则使用点(。),其中一些甚至没有小数,或者用破折号(-)代替了00小数,例如€59,-

So the ideal situation, to get all prices without any problem would be (in my opinion): 因此,理想的情况是(我认为)获得所有价格都没有问题:

  • If you check the numbers from right to left, is the 3rd character a dot or comma (cause every price does not contain more than 2 decimals)? 如果您从右到左查看数字,则第三个字符是点或逗号(因为每个价格所含的小数位数都不超过2个)?
  • Does it contain a dash (like €50,-)? 它是否包含破折号(例如€50,-)?

If both is NO: remove all dots and commas. 如果两者均为“否”:删除所有点和逗号。 If one of the two questions is YES: if the decimals are seperated by a dot (.), replace that dot with a comma, or if it's already a comma, just leave it like that. 如果两个问题之一是“是”:如果小数点用点号(。)分隔,则用逗号替换该点,或者如果它已经是逗号,则将其保留为原来的样子。 And remove the rest of the commas and dots. 并删除其余的逗号和点。

Is that possible with regex? 正则表达式可能吗?

Edit: 编辑:

Sorry, I did not read the problem description carefully enough. 抱歉,我没有足够仔细地阅读问题描述。 I think to solve the problem, you need two regex patterns. 我认为要解决该问题,您需要两个正则表达式模式。 First do a re.sub() then re.findall() 首先做一个re.sub()然后是re.findall()

pattern = re.compile(r'(([.,]{1})(\d{1,3}|-))')  
s = "2,456,777.00  xxxxxxxxxxxxx 59,789,- xxxxxxxxxxxx 59,-  xxxxxxxxxx 1.000,59"

def subs(m):
    g0 = m.group(0)
    g3 = m.group(3)

    if g3 == '-':
        g3 = '00'
    if len(g0) == 4:
        return ',' + g0[1:4]
    else:
        return '.' + g3

c = re.findall(r'[\d.,-]+', re.sub(pattern, subs, s))
print c

>> ['2,456,777.00', '59,789.00', '59.00', '1,000.59']

A little cumbersome indeed. 确实有点麻烦。 Hope someone can come up with a smarter one. 希望有人能提出一个更聪明的建议。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python正则表达式用于美元金额,包括逗号和小数 - Python regex for dollar amounts including commas and decimals VSCode 正则表达式更改以空格分隔的小数点以逗号分隔? - VSCode regex change space-separated decimals to be separated by commas? 如何消除十进制值中的噪音(冗余逗号/点) - Python - How to get rid of noise (redundant commas/dots) in decimal values - Python 用 2 位小数格式化逗号 PYTHON - Formatting commas with 2 decimals PYTHON 浮点数,小数,价格和部分数量 - Floats, Decimals, Prices, and partial quantities Python 正则表达式获取带符号和逗号的十进制值 - Python Regex to Get Decimal Values with Symbols and Commas 如何使这个正则表达式 ((?:\w\s*)+) 提取包含点、逗号和/或换行符的子字符串? - How to make this regex ((?:\w\s*)+) extract substrings that include dots, commas and/or line breaks? 我正在网上搜索产品和价格。 output 出现在产品和价格之间的字符我如何删除它们 - I am web-scraping for product and prices. The output is coming out with characters in between the product and prices how do i remove them 格式化字符串、小数和逗号问题 - Formatted strings, decimals and commas question 在 pyspark dataframe 上用逗号替换点 - Replacing dots with commas on a pyspark dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM