[英]Extract scientific number from string
I am trying to extract scientific numbers from lines in a text file. 我试图从文本文件中的行中提取科学数字。 Something like
就像是
Example: 例:
str = 'Name of value 1.111E-11 Next Name 444.4'
Result: 结果:
[1.111E-11, 444.4]
I've tried solutions in other posts but it looks like that only works for integers (maybe) 我在其他帖子中尝试过解决方案,但它看起来只适用于整数(可能)
>>> [int(s) for s in str.split() if s.isdigit()]
[]
float() would work but I get errors each time a string is used. float()可以工作,但每次使用字符串时都会出错。
>>> float(str.split()[3])
1.111E-11
>>> float(str.split()[2])
ValueError: could not convert string to float: value
Thanks in advance for your help!! 在此先感谢您的帮助!!
This can be done with regular expressions: 这可以使用正则表达式完成:
import re
s = 'Name of value 1.111E-11 Next Name 444.4'
match_number = re.compile('-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?')
final_list = [float(x) for x in re.findall(match_number, s)]
print final_list
output: 输出:
[1.111e-11, 444.4]
Note that the pattern I wrote above depends on at least one digit existing to the left of the decimal point. 请注意,我上面写的模式取决于小数点左边至少有一个数字。
EDIT: 编辑:
Here's a tutorial and reference I found helpful for learning how to write regex patterns. 这是一个教程和参考,我发现有助于学习如何编写正则表达式模式。
Since you asked for an explanation of the regex pattern: 既然你要求解释正则表达式模式:
'-?\ *[0-9]+\.?[0-9]*(?:[Ee]\ *-?\ *[0-9]+)?'
One piece at a time: 一次一件:
-? optionally matches a negative sign (zero or one negative signs)
\ * matches any number of spaces (to allow for formatting variations like - 2.3 or -2.3)
[0-9]+ matches one or more digits
\.? optionally matches a period (zero or one periods)
[0-9]* matches any number of digits, including zero
(?: ... ) groups an expression, but without forming a "capturing group" (look it up)
[Ee] matches either "e" or "E"
\ * matches any number of spaces (to allow for formats like 2.3E5 or 2.3E 5)
-? optionally matches a negative sign
\ * matches any number of spaces
[0-9]+ matches one or more digits
? makes the entire non-capturing group optional (to allow for the presence or absence of the exponent - 3000 or 3E3
note: \\d is a shortcut for [0-9], but I'm jut used to using [0-9]. 注意:\\ d是[0-9]的快捷方式,但我习惯使用[0-9]。
You could always just use a for
loop and a try-except
statement. 你总是可以使用
for
循环和try-except
语句。
>>> string = 'Name of value 1.111E-11 Next Name 444.4'
>>> final_list = []
>>> for elem in string.split():
try:
final_list.append(float(elem))
except ValueError:
pass
>>> final_list
[1.111e-11, 444.4]
I'd use Regex: 我用的是Regex:
import re
s = 'Name of value 1.111E-11 Next Name 444.4'
print [float(x) for x in re.findall("-?\d+.?\d*(?:[Ee]-\d+)?", s)]
output: 输出:
[1.111e-11, 444.4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.