简体   繁体   中英

Using Regular Expressions to extract numerical quantities from a file and find the sum

I am a beginner and learning python. The problem is that I have to extract numbers from a file (in which numbers can be anywhere. can be multiple times in the same line. some lines may not have numbers and some lines may be new lines) and find their sum. I did know how to solve it, and this was my code

import re
new=[]
s=0
fhand=open("sampledata.txt")
for line in fhand:
    if re.search('^.+',line):         #to exclude lines which have nothing
        y=re.findall('([0-9]*)',line) #this part is supposed to extract only the
        for i in range(len(y)):       #the numerical part, but it extracts all the words. why?
            try:
                y[i]=float(y[i])
            except:
                y[i]=0
        s=s+sum(y)
print s

The code works, but it is not a pythonic way to do it. Why is the ([0-9]*) extracting all the words instead of only numbers? What is the pythonic way to do it?

Your regular expression has ([0-9]*) which will find all words with zero or more numbers. You probably want ([0-9]+) instead.

您好,您通过添加“ *”在正则表达式中犯了一个错误,如下所示:

y=re.findall('([0-9])',line)

Expanding on wind85's answer, you might want to fine tune your regular expression depending on what kind of numbers you expect to find in your file. For example, if your numbers might have a decimal point in them, then you might want something like [0-9]+(?:\\.[0-9]+)? (one or more digits optionally followed by a period and one or more digits).

As for making it more pythonic, here's how I'd probably write it:

s=0
for line in open("sampledata.txt"):
    s += sum(float(y) for y in re.findall(r'[0-9]+',line))
print s

If you want to get really fancy, you can make it a one-liner:

print sum(float(y) for line in open('sampledata.txt') 
                   for y in re.findall(r'[0-9]+',line))

but personally I find that kind of thing hard to read.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM