Is there a better way to read an element from a file in Python?

Question

I have written a crude Python program to pull phrases from an index in a CSV file and write these rows to another file.

import csv

total = 0

ifile = open('data.csv', "rb")
reader = csv.reader(ifile)

ofile = open('newdata_write.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

   < many, many more lines >

print "\nTotal = %d." % total

ifile.close()

My question is this: Isn't there a better (more elegant/less verbose) Pythonic way to do this? I feel this is a case of not knowing what I don't know. The CSV file I'm searching is not large (3863 lines, 669 KB) so I don't think it is necessary to use SQL to solve this , although I am certainly open to that.

I am a Python newbie, in love with the language and teaching myself through the normal channels (books, tutorials, Project Euler, Stack Overflow).

Any suggestions are greatly appreciated.

Answer 1

You're looking for any with a generator expression:

matches = "some text", "some more text", "even more text I'm looking for"
for row in reader:
    if any(match in row for match in matches):  
        total += 1   
        writer.writerow(row)

Alternatively, you could just write all the rows at once:

writer.writerows(row for row in reader if any(match in row for match in matches))

but as written that doesn't get you a total.

Answer 2

It's not a huge improvement, but you could do something like

keyphraseList = (
     "some text",
     "some more text",
     "even more text I'm looking for")

...
for row in reader:
   for phrase in keyphraseList:
       if phrase in row[x]:
           total = total + 1
           writer.writerow(row)
           break

(not tested)

Answer 3

not necessairly, 'better', but I would compare the item to a set and clean up total a bit. It may not be 'better' but it is more succinct

This

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

becomes

myWords = set(('some text','some more text','even more'))
for row in reader:
     if row[x] in myWords: 
          total += 1
          writer.writerow(row)

you could just use a simple list, but sets become quicker on more memory intensive tasks.

in response to the comment by agf

>>> x = set(('something','something else'))
>>> Ture if 'some' in x else False
False
>>> True if 'something' in x else False
True

is this what your saying would not work?

Answer 4

You can get pythonic by using list comprehensions instead of for loops. For example, if you are looking for index strings 'aa' or 'bb', you could do

matches = [row for row in reader if 'aa' in row[0] or 'bb' in row[0]]

Answer 5

I'm not sure this version is better, just shorter, anyway hope it helps

import csv

total = 0

keys = ['a', 'b', 'c']
with open('infile', 'rb') as infile, open('outfile', 'wb') as outfile:
    rows = [x for x in csv.reader(infile) if any([k in x[0] for k in keys])]
    csv.writer(outfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL).writerows(rows)

print 'Total: %d' % len(rows)

Is there a better way to read an element from a file in Python?

Question

5 answers

solution1
6 2011-10-05 16:17:16

solution2
1 ACCPTED 2011-10-05 16:14:29

solution3
0 2011-10-05 16:16:37

This

becomes

solution4
0 2011-10-05 16:20:29

solution5
0 2011-10-05 16:33:47

Is there a better way to read an element from a file in Python?

Question

5 answers

solution1 6 2011-10-05 16:17:16

solution2 1 ACCPTED 2011-10-05 16:14:29

solution3 0 2011-10-05 16:16:37

This

becomes

solution4 0 2011-10-05 16:20:29

solution5 0 2011-10-05 16:33:47

solution1
6 2011-10-05 16:17:16

solution2
1 ACCPTED 2011-10-05 16:14:29

solution3
0 2011-10-05 16:16:37

solution4
0 2011-10-05 16:20:29

solution5
0 2011-10-05 16:33:47