使用正则表达式从文件中提取数值量并求和

Question

我是一个初学者，正在学习python。 问题是我必须从文件中提取数字（其中数字可以在任何地方。可以在同一行中多次。某些行可能没有数字，而某些行可能是新行）并找到它们的总和。 我确实知道如何解决它，这是我的代码

import re
new=[]
s=0
fhand=open("sampledata.txt")
for line in fhand:
    if re.search('^.+',line):         #to exclude lines which have nothing
        y=re.findall('([0-9]*)',line) #this part is supposed to extract only the
        for i in range(len(y)):       #the numerical part, but it extracts all the words. why?
            try:
                y[i]=float(y[i])
            except:
                y[i]=0
        s=s+sum(y)
print s

该代码可以工作，但是它不是实现此目的的Python方法。 为什么[[0-9] *）提取所有单词而不是仅提取数字？ pythonic的实现方法是什么？

Answer 1

您的正则表达式具有([0-9]*) ，它将查找具有零个或多个数字的所有单词。 您可能需要([0-9]+) 。

Answer 2

您好，您通过添加“ *”在正则表达式中犯了一个错误，如下所示：

y=re.findall('([0-9])',line)

Answer 3

扩展wind85的答案后，您可能希望根据希望在文件中找到的数字类型来微调正则表达式。 例如，如果您的数字中可能带有小数点，那么您可能想要类似[0-9]+(?:\\.[0-9]+)? （一个或多个数字（可选），后跟一个句点和一个或多个数字）。

至于使它更具pythonic风格，我可能会这样写：

s=0
for line in open("sampledata.txt"):
    s += sum(float(y) for y in re.findall(r'[0-9]+',line))
print s

如果您真的想花哨的话，可以将它设为单线：

print sum(float(y) for line in open('sampledata.txt') 
                   for y in re.findall(r'[0-9]+',line))

但是我个人觉得这种事情很难阅读。

使用正则表达式从文件中提取数值量并求和

问题描述

3 个解决方案

解决方案1
0 2016-06-15 19:52:07

解决方案2
0 2016-06-15 19:57:28

解决方案3
0 2016-06-15 20:23:21

使用正则表达式从文件中提取数值量并求和

问题描述

3 个解决方案

解决方案1 0 2016-06-15 19:52:07

解决方案2 0 2016-06-15 19:57:28

解决方案3 0 2016-06-15 20:23:21

解决方案1
0 2016-06-15 19:52:07

解决方案2
0 2016-06-15 19:57:28

解决方案3
0 2016-06-15 20:23:21