简体   繁体   English

使用正则表达式和或以摆脱不需要的字符

[英]Using Regex and or in order to get rid of unwanted characters

I'm getting verry confused with the regex and I need help.我对正则表达式感到非常困惑,我需要帮助。 I have the following string:我有以下字符串:

x='def{{{12.197835/// -0.001172, 12.19788 7.3E-5, //+{{12.196705 -1.7E-5, 12.196647 -0.001189///}}}Def'

This string is part of cell in specific column in pandasdataframe.此字符串是 pandasdataframe 中特定列中单元格的一部分。 each cell has different unwanted characters, mainly letters and "/" or "{".每个单元格都有不同的不需要的字符,主要是字母和“/”或“{”。

I want to have this output:我想要这个 output:

x='12.197835,-0.001172, 12.19788,7.3E-5,12.196705 ,-1.7E-5, 12.196647 -0.001189'

(get rid of anything that is not a digit, beside if is a number with "-" before or E- which is "E-" with digit before. (去掉任何不是数字的东西,除了 if 是一个前面有“-”的数字或 E- 是前面有数字的“E-”。

I have used this expression in order to ger inly the digits:我使用了这个表达式来只输入数字:

print(re.findall(r"\d+\.*\d*",x))
>>>['12.197835', '0.001172', '12.19788', '7.3', '5', '12.196705', '1.7', '5', '12.196647', '0.001189']

but my problem is that this expression does not preserve the '-' or the 'E'.但我的问题是这个表达式不保留'-'或'E'。 I have tried to save them by the following expression:我试图通过以下表达式保存它们:

print(re.findall(r"\d+\.*\d*",x) or (r"^-?[0-9]\d+\.*\d+*\[E-]",x))

but I get the same output:但我得到相同的 output:


>>>['12.197835', '0.001172', '12.19788', '7.3', '5', '12.196705', '1.7', '5', '12.196647', '0.001189']

I thought maybe is because i'm using or and then it alreay satisfy the first condition so I tried also "and" but that gives very weird results:我想可能是因为我正在使用 or 然后它已经满足第一个条件所以我也尝试了“and”但这给出了非常奇怪的结果:

>>>('^-?[0-9]\\d+\\.*\\d+*\\[E-]', 'def{{{12.197835/// -0.001172, 12.19788 7.3E-5, //+{{12.196705 -1.7E-5, 12.196647 -0.001189///}}}Def')

My end goal is to get the first string with only digits, '-' and E that has after it '-' (the desired output)我的最终目标是获得第一个只有数字的字符串,'-' 和 E 后面有 '-' (所需的输出)

x='12.197835,-0.001172, 12.19788,7.3E-5,12.196705 ,-1.7E-5, 12.196647 -0.001189'

You may use您可以使用

import re
x='def{{{12.197835/// -0.001172, 12.19788 7.3E-5, //+{{12.196705 -1.7E-5, 12.196647 -0.001189///}}}Def'
print(re.findall(r'[+-]?\d*\.?\d+(?:[eE][+-]?\d+)?', x))  # Extracting all numbers into a list
# => ['12.197835', '-0.001172', '12.19788', '7.3E-5', '12.196705', '-1.7E-5', '12.196647', '-0.001189']
print(",".join(re.findall(r'[+-]?\d*\.?\d+(?:[eE][+-]?\d+)?', x))) # Creating a comma-separated string
# => 12.197835,-0.001172,12.19788,7.3E-5,12.196705,-1.7E-5,12.196647,-0.001189

See the Python demo and the regex demo .请参阅Python 演示正则表达式演示

Regex details正则表达式详细信息

  • [+-]? - an optional + or - - 可选的+-
  • \d* - zero or more digits \d* - 零个或多个数字
  • \.? - an optional . - 一个可选的.
  • \d+ - one or more digits \d+ - 一位或多位数字
  • (?:[eE][+-]?\d+)? - an optional occurrence of e or E followed with an optional + or - and then one or more digits. - eE的可选出现后跟可选的+-以及一个或多个数字。

Hope this help you (without using regex).希望这对您有所帮助(不使用正则表达式)。

x='def{{{12.197835/// -0.001172, 12.19788 7.3E-5, //+{{12.196705 -1.7E-5, 12.196647 -0.001189///}}}Def'

x=x.replace('{','').replace('}','').replace('def','').replace('Def','').replace('/','').replace('  ',' ').replace(' ',',').replace(',,',',')

print(x)

[Result]: [结果]:

12.197835,-0.001172,12.19788,7.3E-5,+12.196705,-1.7E-5,12.196647,-0.001189

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM