简体   繁体   English

使用正则表达式删除包含数字的双引号

[英]Remove double quotes enclosing numbers using regex

I'm working with the following string: 我正在使用以下字符串:

'"name": "Gnosis", \n        "symbol": "GNO", \n        "rank": "99", \n        "price_usd": "175.029", \n        "price_btc": "0.0186887", \n        "24h_volume_usd": "753877.0"'

and I have to use re.sub() in python to replace only the double quotes ( " ) that are enclosing the numbers, in order to parse it later in JSON. I've tried with some regular expressions, but without success. Here is my best attempt: 并且我必须在python中使用re.sub()来仅替换括在数字中的双引号( " ),以便稍后在JSON中进行解析。我尝试了一些正则表达式,但没有成功。是我最好的尝试:

exp = re.compile(r': (")\D+\.*\D*(")', re.MULTILINE)
response = re.sub(exp, "", string)

I've searched a lot for a similar problem, but have not found another similar question. 我已经搜索了很多类似的问题,但是还没有找到另一个类似的问题。

EDIT: 编辑:

Finally I've used (thanks to S. Kablar ): 最后,我用了(感谢S. Kablar ):

fomatted = re.sub(r'"(-*\d+(?:\.\d+)?)"', r"\1", string)
parsed = json.loads(formatted)

The problem is that this endpoint returns a bad formatted string as JSON. 问题在于此端点返回的格式错误的字符串为JSON。

Other users answered "Parse the string first with json, and later convert numbers to float" with a for loop and, I think, is a very inneficient way to do it, also, you will be forced to select between int or float type for your response. 其他用户用for循环回答 “首先使用json解析字符串,然后将数字转换为float”,我认为这是一种非常无效的方法,而且,您将不得不在int或float类型之间进行选择你的回应。 To get out of doubt, I've wrote this gist where I show you the comparations between the different approachs with benchmarking, and for now I'm going to trust in regex in this case. 毫无疑问,我已经写了这个要点 ,向您展示基准测试中不同方法之间的比较,现在,在这种情况下,我将信任正则表达式。

Thanks everyone for your help 谢谢大家的帮助

Parse the string first with json, and later convert numbers to floats: 首先使用json解析字符串,然后将数字转换为浮点数:

string = '{"name": "Gnosis", \n        "symbol": "GNO", \n        "rank": "99", \n        "price_usd": "175.029", \n        "price_btc": "0.0186887", \n        "24h_volume_usd": "753877.0"}'

data = json.loads(string)
response = {}
for key, value in data.items():
    try:
        value = int(value) if value.strip().isdigit() else float(value)
    except ValueError:
        pass
    response[key] = value

Regex : "(-?\\d+(?:[\\.,]\\d+)?)" Substitution : \\1 正则表达式"(-?\\d+(?:[\\.,]\\d+)?)" 替代\\1

Details: 细节:

  • () Capturing group ()捕获组
  • (?:) Non capturing group (?:)非捕获组
  • \\d Matches a digit (equal to [0-9] ) \\d匹配一个数字(等于[0-9]
  • + Matches between one and unlimited times +无限次匹配
  • ? Matches between zero and one times 零到一匹配
  • \\1 Group 1. \\1组1。

Python code : Python代码

def remove_quotes(text):
    return re.sub(r"\"(-?\d+(?:[\.,]\d+)?)\"", r'\1', text)

remove_quotes('"percent_change_7d": "-23.43"') >> "percent_change_7d": -23.43

You came close. 你走近了 You want to save the numbers, and the colon, so you need to put them in parentheses, not the rest. 您要保存数字和冒号,因此需要将其放在括号中,而不要放在括号中。 Also, numbers are \\d , not \\D (that would be not -numbers). 同样,数字是\\d ,而不是\\D (那不是 -numbers)。

So: 所以:

exp = re.compile(r'(: *)"(\d+\.?\d*)"', re.MULTILINE)
response = re.sub(exp, "\\1\\2", string)

\d+\.?\d*  means "a number (or more), a point (or not), any numbers"

Border cases 边境案件

The above doesn't cover ".125", which is no numbers, one point. 上面没有涵盖“ .125”,这是一个 没有数字的点。

And if you changed to "\\d*.?\\d*", that would match ".", since it is **any numbers", one point, any numbers". 如果更改为“ \\ d *。?\\ d *”,则它将匹配“。”,因为它是“任意数字”,“一点,任意数字”。

I think the only practicable way is 我认为唯一可行的方法是

 (\d+\.?\d*|\.\d+)

with | 与| meaning "or": so, either a number optionally followed by one point and any digits (this matches "17."), or a point followed by at least one digit. 意思是“或”:因此,可以是一个数字(可选),后跟一个点和任何数字(与“ 17.”匹配), 或者一个点后至少有一个数字。 Unfortunately, "\\d+.?\\d+" does not match "5". 不幸的是,“ \\ d +。?\\ d +”与“ 5”不匹配。

Or you specify all three cases: 或者您指定所有三种情况:

 (\d+|\d+\.?\d+|\.\d+)

First integers (\\d+), then floating points with or without decimals, then decimal parts alone without leading zeroes. 前一个整数(\\ d +),然后是带有或不带有小数的浮点数,然后是仅带有十进制部分且没有前导零的浮点数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM