简体   繁体   English

有没有更好的方法来解析python文件?

[英]Is there a better way to parse a file in python?

I am looking for some better way to parse a huge file. 我正在寻找一种更好的方法来解析大文件。 Following is the example of the file. 以下是该文件的示例。

sample.txt sample.txt

'abcdefghi'
'xyzwfg'
'lmnop'

Out of which I am looking for 'abc' and 'xyz' in the file at least once 我正在其中至少一次在文件中寻找'abc'和'xyz'

I was able to find them but I am looking for some better way. 我能够找到它们,但我正在寻找更好的方法。 Following is my code 以下是我的代码

datafile = file('sample.txt')
abc = 0
xyz = 0
found - True

for line in datafile:
        if 'abc' in line:
            abc += 1
            break    
for line in datafile:
        if 'xyz' in line:
            xyz += 1
            break

if (abc + xyz) >= 2:
    print 'found'
else:
    print 'fail'

I am running a loop twice. 我两次运行循环。 So is there a better way to parse the file? 那么,有没有更好的方法来解析文件?

Your current code will produce incorrect results if you 'xyz' occurs before 'abc' . 如果您在'abc'之前出现'xyz'当前的代码将产生错误的结果。 To fix this test for both in the same loop. 要在同一个循环中同时解决这两个问题。

with open('sample.txt') as datafile:
    abc_found = False
    xyz_found = False

    for line in datafile:
        if 'abc' in line:
            abc_found = True
        if 'xyz' in line:
            xyz_found = True
        if abc_found and xyz_found: 
            break # stop looking if both found

"Better" is subjective and there are no metrics provided to define it. “更好”是主观的,没有提供定义它的度量。 However, a simple optimization is the following: 但是,以下是一个简单的优化:

for line in datafile:
    if 'abc' in line:
        abc += 1
    if 'xyz' in line:
        xyz += 1

If the actual problem is that the file is indeed very large, you want to only read one line at a time: 如果实际问题是文件确实很大,则您一次只想读取一行:

f = open('myTextFile.txt', "r")
line = f.readline()
while line:
    if 'abc' in line:
        abc += 1
    if 'xyz' in line:
        xyz += 1
    line = f.readline()

The result of this would be to get the number of lines in which abc and xyz occurred, respectively. 这样的结果将是获得分别出现abcxyz的行数。 If the idea is to quit as soon as you find a single matching line, then including the break is appropriate. 如果您的想法是在找到一条匹配的线后立即退出,则包括break是适当的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM