简体   繁体   English

有没有更好的方法从Python中的文件中读取元素?

[英]Is there a better way to read an element from a file in Python?

I have written a crude Python program to pull phrases from an index in a CSV file and write these rows to another file. 我编写了一个粗略的Python程序,用于从CSV文件中的索引中提取短语,并将这些行写入另一个文件。

import csv

total = 0

ifile = open('data.csv', "rb")
reader = csv.reader(ifile)

ofile = open('newdata_write.csv', "wb")
writer = csv.writer(ofile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL)

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

   < many, many more lines >

print "\nTotal = %d." % total

ifile.close()

My question is this: Isn't there a better (more elegant/less verbose) Pythonic way to do this? 我的问题是:是不是有更好的(更优雅/更简洁)Pythonic方式来做到这一点? I feel this is a case of not knowing what I don't know. 我觉得这是一个不知道我不知道的情况。 The CSV file I'm searching is not large (3863 lines, 669 KB) so I don't think it is necessary to use SQL to solve this , although I am certainly open to that. 我正在搜索的CSV文件不大(3863行,669 KB),所以我认为没有必要使用SQL来解决这个问题 ,尽管我当然对此持开放态度。

I am a Python newbie, in love with the language and teaching myself through the normal channels (books, tutorials, Project Euler, Stack Overflow). 我是一名Python新手,热爱语言并通过正常渠道(书籍,教程,Project Euler,Stack Overflow)自学。

Any suggestions are greatly appreciated. 任何建议都非常感谢。

You're looking for any with a generator expression: 你正在寻找any带有生成器表达式的东西:

matches = "some text", "some more text", "even more text I'm looking for"
for row in reader:
    if any(match in row for match in matches):  
        total += 1   
        writer.writerow(row)

Alternatively, you could just write all the rows at once: 或者,您可以一次写下所有行:

writer.writerows(row for row in reader if any(match in row for match in matches))

but as written that doesn't get you a total. 但正如所写,不会让你一共。

It's not a huge improvement, but you could do something like 这不是一个巨大的进步,但你可以做类似的事情

keyphraseList = (
     "some text",
     "some more text",
     "even more text I'm looking for")

...
for row in reader:
   for phrase in keyphraseList:
       if phrase in row[x]:
           total = total + 1
           writer.writerow(row)
           break

(not tested) (未测试)

not necessairly, 'better', but I would compare the item to a set and clean up total a bit. 不是必须的,'更好',但我会将项目与一组进行比较并清理一下总数。 It may not be 'better' but it is more succinct 它可能不是“更好”,但它更简洁

This 这个

for row in reader:
    if ("some text") in row[x]:
        total = total + 1
        writer.writerow(row)
    elif ("some more text") in row[x]:
        total = total + 1   
        writer.writerow(row) 
    elif ("even more text I'm looking for") in row[x]:  
        total = total + 1   
        writer.writerow(row)

becomes

myWords = set(('some text','some more text','even more'))
for row in reader:
     if row[x] in myWords: 
          total += 1
          writer.writerow(row)

you could just use a simple list, but sets become quicker on more memory intensive tasks. 你可以只使用一个简单的列表,但在更多内存密集型任务上设置会更快。

in response to the comment by agf 回应agf的评论

>>> x = set(('something','something else'))
>>> Ture if 'some' in x else False
False
>>> True if 'something' in x else False
True

is this what your saying would not work? 这是你的说法不起作用?

You can get pythonic by using list comprehensions instead of for loops. 您可以使用列表推导而不是for循环来获得pythonic。 For example, if you are looking for index strings 'aa' or 'bb', you could do 例如,如果您要查找索引字符串'aa'或'bb',则可以这样做

matches = [row for row in reader if 'aa' in row[0] or 'bb' in row[0]]

I'm not sure this version is better, just shorter, anyway hope it helps 我不确定这个版本是否更好,只是更短,无论如何希望它有所帮助

import csv

total = 0

keys = ['a', 'b', 'c']
with open('infile', 'rb') as infile, open('outfile', 'wb') as outfile:
    rows = [x for x in csv.reader(infile) if any([k in x[0] for k in keys])]
    csv.writer(outfile, delimiter='\t', quotechar='"', quoting=csv.QUOTE_ALL).writerows(rows)

print 'Total: %d' % len(rows)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM