简体   繁体   English

使用正则表达式从文件中提取数值量并求和

[英]Using Regular Expressions to extract numerical quantities from a file and find the sum

I am a beginner and learning python. 我是一个初学者,正在学习python。 The problem is that I have to extract numbers from a file (in which numbers can be anywhere. can be multiple times in the same line. some lines may not have numbers and some lines may be new lines) and find their sum. 问题是我必须从文件中提取数字(其中数字可以在任何地方。可以在同一行中多次。某些行可能没有数字,而某些行可能是新行)并找到它们的总和。 I did know how to solve it, and this was my code 我确实知道如何解决它,这是我的代码

import re
new=[]
s=0
fhand=open("sampledata.txt")
for line in fhand:
    if re.search('^.+',line):         #to exclude lines which have nothing
        y=re.findall('([0-9]*)',line) #this part is supposed to extract only the
        for i in range(len(y)):       #the numerical part, but it extracts all the words. why?
            try:
                y[i]=float(y[i])
            except:
                y[i]=0
        s=s+sum(y)
print s

The code works, but it is not a pythonic way to do it. 该代码可以工作,但是它不是实现此目的的Python方法。 Why is the ([0-9]*) extracting all the words instead of only numbers? 为什么[[0-9] *)提取所有单词而不是仅提取数字? What is the pythonic way to do it? pythonic的实现方法是什么?

Your regular expression has ([0-9]*) which will find all words with zero or more numbers. 您的正则表达式具有([0-9]*) ,它将查找具有零个或多个数字的所有单词。 You probably want ([0-9]+) instead. 您可能需要([0-9]+)

您好,您通过添加“ *”在正则表达式中犯了一个错误,如下所示:

y=re.findall('([0-9])',line)

Expanding on wind85's answer, you might want to fine tune your regular expression depending on what kind of numbers you expect to find in your file. 扩展wind85的答案后,您可能希望根据希望在文件中找到的数字类型来微调正则表达式。 For example, if your numbers might have a decimal point in them, then you might want something like [0-9]+(?:\\.[0-9]+)? 例如,如果您的数字中可能带有小数点,那么您可能想要类似[0-9]+(?:\\.[0-9]+)? (one or more digits optionally followed by a period and one or more digits). (一个或多个数字(可选),后跟一个句点和一个或多个数字)。

As for making it more pythonic, here's how I'd probably write it: 至于使它更具pythonic风格,我可能会这样写:

s=0
for line in open("sampledata.txt"):
    s += sum(float(y) for y in re.findall(r'[0-9]+',line))
print s

If you want to get really fancy, you can make it a one-liner: 如果您真的想花哨的话,可以将它设为单线:

print sum(float(y) for line in open('sampledata.txt') 
                   for y in re.findall(r'[0-9]+',line))

but personally I find that kind of thing hard to read. 但是我个人觉得这种事情很难阅读。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM