简体   繁体   English

在Python中从文件内部删除特定行

[英]Removing specific lines from inside a file in Python

I have a list of documents in a file. 我在文件中有文件清单。 Basically the TDT2 corpus consisting on both mandarin and english files. 基本上,TDT2语料库包含普通话和英语文件。 I want to keep only the english documents and remove the mandarin ones. 我只想保留英文文件并删除普通话。 Manually doing so would take very long since the file is huge. 由于文件很大,因此手动进行此操作将花费很长时间。

The structure looks something like this: 结构看起来像这样:

<ONTOPIC topicid=20001 level=YES docno=VOA19980630.1800.3165 fileid=19980630_1800_1900_VOA_ENG comments="NO">
<ONTOPIC topicid=20001 level=BRIEF docno=VOM19980220.0700.0559 fileid=19980220_0700_0800_VOA_MAN comments="NO">
<ONTOPIC topicid=20001 level=YES docno=VOM19980220.0700.1159 fileid=19980220_0700_0800_VOA_MAN comments="NO">

So I want to remove the files which have a 'MAN' in their fileid. 所以我想删除文件ID中有“ MAN”的文件。 How can I do this specific task in Python? 如何在Python中执行此特定任务?

If lines are not written with \\n just remove it from the endswith clause. 如果没有用\\n编写行,只需将它们从endswith子句中删除即可。 This will ignore any files which end with MAN comments="NO"> and output out the other files. 这将忽略所有以MAN comments="NO">结尾的文件,并输出其他文件。

out = open('file2.txt','wb')    
for i in open('file.txt'):
    if i.endswith('MAN comments="NO">\n'):
        pass
    else:
        out.write(i)

out.close()

If you are sure 'MAN' will only be part of mandarian ones. 如果您确定“ MAN”将仅属于Mandarian。 Looks a bit cleaner. 看起来有点干净。

out = open('file2.txt','wb')    
for i in open('file.txt'):
    if 'MAN' not in i:
        out.write(i)

out.close()

You can try this: 您可以尝试以下方法:

def start():
    sFile = "source.txt"
    dFile = "results.txt"
    with open(dFile, 'w') as dHandle:
        with open (sFile, "r") as fhandle:
            for fline in fhandle.readlines():
                if "MAN" not in fline:
                    dHandle.write(fline)

start()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM