简体   繁体   中英

Is there a better way to read an element from a file in Python?

I have written a crude Python program to pull phrases from an index in a CSV file and write these rows to another file.

import csv

total = 0

ifile = open('data.csv', "rb")
reader = csv.reader(ifile)

ofile = open('newdata_write.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

   < many, many more lines >

print "\nTotal = %d." % total

ifile.close()

My question is this: Isn't there a better (more elegant/less verbose) Pythonic way to do this? I feel this is a case of not knowing what I don't know. The CSV file I'm searching is not large (3863 lines, 669 KB) so I don't think it is necessary to use SQL to solve this , although I am certainly open to that.

I am a Python newbie, in love with the language and teaching myself through the normal channels (books, tutorials, Project Euler, Stack Overflow).

Any suggestions are greatly appreciated.

You're looking for any with a generator expression:

matches = "some text", "some more text", "even more text I'm looking for"
for row in reader:
    if any(match in row for match in matches):  
        total += 1   
        writer.writerow(row)

Alternatively, you could just write all the rows at once:

writer.writerows(row for row in reader if any(match in row for match in matches))

but as written that doesn't get you a total.

It's not a huge improvement, but you could do something like

keyphraseList = (
     "some text",
     "some more text",
     "even more text I'm looking for")

...
for row in reader:
   for phrase in keyphraseList:
       if phrase in row[x]:
           total = total + 1
           writer.writerow(row)
           break

(not tested)

not necessairly, 'better', but I would compare the item to a set and clean up total a bit. It may not be 'better' but it is more succinct

This

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

becomes

myWords = set(('some text','some more text','even more'))
for row in reader:
     if row[x] in myWords: 
          total += 1
          writer.writerow(row)

you could just use a simple list, but sets become quicker on more memory intensive tasks.

in response to the comment by agf

>>> x = set(('something','something else'))
>>> Ture if 'some' in x else False
False
>>> True if 'something' in x else False
True

is this what your saying would not work?

You can get pythonic by using list comprehensions instead of for loops. For example, if you are looking for index strings 'aa' or 'bb', you could do

matches = [row for row in reader if 'aa' in row[0] or 'bb' in row[0]]

I'm not sure this version is better, just shorter, anyway hope it helps

import csv

total = 0

keys = ['a', 'b', 'c']
with open('infile', 'rb') as infile, open('outfile', 'wb') as outfile:
    rows = [x for x in csv.reader(infile) if any([k in x[0] for k in keys])]
    csv.writer(outfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL).writerows(rows)

print 'Total: %d' % len(rows)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM