简体   繁体   English

Python脚本从包含数组单词的文件中删除行

[英]Python script to remove lines from file containing words in array

I have the following script which identifies lines in a file which I want to remove, based on an array but does not remove them. 我有以下脚本,该脚本基于数组标识要删除的文件中的行,但不删除它们。

What should I change? 我应该改变什么?

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin.readlines(): 
        for item in offending: 
                print "got one",line 
                line = line.replace( item, "MUST DELETE" ) 
                line=line.strip()
                fout.write(line)  
    fin.close() 
    fout.close() 

fixup(sourcefile)
sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin: 
        if True in [item in line for item in offending]:
            continue
        fout.write(line)
    fin.close() 
    fout.close() 

fixup(sourcefile)

EDIT : Or even better: 编辑 :甚至更好:

for line in fin: 
    if not True in [item in line for item in offending]:
        fout.write(line)

The basic strategy is to write a copy of the input file to the output file, but with changes. 基本策略是将输入文件的副本写入输出文件,但要进行更改。 In your case, the changes are very simple: you just omit the lines you don't want. 对于您而言,更改非常简单:您只需要省略不需要的行。

Once you have your copy safely written, you can delete the original file and use 'os.rename()' to rename your temp file to the original file name. 一旦安全地编写了副本,就可以删除原始文件,并使用'os.rename()'将临时文件重命名为原始文件名。 I like to write the temp file in the same directory as the original file, to make sure I have permission to write in that directory and because I don't know if os.rename() can move a file from one volume to another. 我喜欢将temp文件写入与原始文件相同的目录中,以确保我有权在该目录中写入,并且因为我不知道os.rename()可以将文件从一个卷移动到另一个卷。

You don't need to say for line in fin.readlines() ; 您无需for line in fin.readlines()for line in fin.readlines() it is enough to say for line in fin . for line in fin说够了。 When you use .readlines() you are telling Python to read every line of the input file, all at once, into memory; 使用.readlines() ,是在告诉Python将输入文件的每一行一次全部读入内存; when you just use fin by itself you read one line at a time. 当您只使用fin时,您一次只能读取一行。

Here is your code, modified to do these changes. 这是您的代码,进行了修改以进行这些更改。

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def line_offends(line, offending):
    for word in line.split():
        if word in offending:
            return True
    return False

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin:
        if line_offends(line, offending):
            continue
        fout.write(line)
    fin.close()
    fout.close()
    #os.rename() left as an exercise for the student

fixup(sourcefile)

If line_offends() returns True, we execute continue and the loop continues without executing the next part. 如果line_offends()返回True,我们将continue执行并且循环将继续而不执行下一部分。 That means the line never gets written. 这意味着该行永远不会被写入。 For this simple example, it would really be just as good to do it this way: 对于这个简单的示例,以这种方式进行操作确实一样好:

    for line in fin:
        if not line_offends(line, offending):
            fout.write(line)

I wrote it with the continue because often there is non-trivial work being done in the main loop, and you want to avoid all of it if the test is true. 我用continue编写它是因为在主循环中经常要做一些不平凡的工作,并且如果测试是正确的,则您希望避免所有这些工作。 IMHO it is nicer to have a simple "if this line is unwanted, continue" rather than indenting a whole bunch of stuff inside an if for a condition that might be very rare. 恕我直言,最好有一个简单的“如果不需要此行,请继续”,而不是在可能非常罕见的情况下在if缩进一堆东西。

You're not writing it to the output file. 您没有将其写入输出文件。 Also, I would use "in" to check for the string existing in the line. 另外,我将使用“ in”来检查该行中是否存在字符串。 See the modified script below (not tested): 请参阅下面的修改后的脚本(未经测试):

sourcefile = "C:\\Python25\\PC_New.txt" 
filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 
    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 

    for line in fin.readlines(): 
        if not offending in line:
            # There are no offending words in this line
            # write it to the output file
            fout.write(line)

    fin.close() 
    fout.close() 

fixup(sourcefile)

'''This is a rather simple implementation but should do what you are searching for''' '''这是一个非常简单的实现,但是应该执行您要搜索的操作'''

sourcefile = "C:\\Python25\\PC_New.txt"

filename2 = "C:\\Python25\\PC_reduced.txt"

offending = ["Exception","Integer","RuntimeException"]

def fixup( filename ): 

    print "fixup ", filename 
    fin = open( filename ) 
    fout = open( filename2 , "w") 
    for line in fin.readlines(): 
        for item in offending: 
                print "got one",line 
                line = line.replace( item, "MUST DELETE" ) 
                line=line.strip()
                fout.write(line)  
    fin.close() 
    fout.close() 

fixup(sourcefile)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM