简体   繁体   English

Python:搜索一些单词,然后删除.txt文件中的全行

[英]Python: Search some word and delete then full line in .txt file

I have text file where is probably sometimes one line too much and i have to delete it. 我有文本文件,有时可能一行太多,我必须删除它。 Not always but still need to check it everytime. 并非总是如此,但仍需要每次进行检查。

The phrase always includes the same words at the beginning, but end of the line are maybe different, howefever full line need to delete. 该词组的开头始终包含相同的单词,但行尾可能有所不同,但是需要删除整行。

Example: 例:

This is original lines in middle of txt file: 这是txt文件中间的原始行:

.........
<br>rrrrr TTTTTT ffgggggggg
<br>ja UOOOOOOOO on >= 16 täysin.
<br>ja numeroyhdistelmä on 9- 39- 9
<br>ja href="./reeeee.html">wwwwjjhjhkkghjky. </a> </td>
</tr></TABLE>
<table border=0 cellpadding= 25 width= 560><TR><TD width=80></TD><TD 
width=240><PRE>
.........   

after python code lines would be: python代码行之后将是:

.........
<br>rrrrr TTTTTT ffgggggggg
<br>ja UOOOOOOOO on >= 16 täysin.
<br>ja href="./reeeee.html">wwwwjjhjhkkghjky. </a> </td>
</tr></TABLE>
<table border=0 cellpadding= 25 width= 560><TR><TD width=80></TD><TD 
width=240><PRE> 
.........

So line what need delete is: 因此,行需要删除的是:

    <br>ja numeroyhdistelmä on 9- 39- 9

If i use letter "ä" to the code it gives some "unicode" errors but i havent choice try something else word to search because beginning the line are somewhere else too and values "9- 39- 9" probably change. 如果我在代码中使用字母“ä”,则会出现一些“ unicode”错误,但是我没有选择尝试其他词来搜索,因为开始行也在其他位置,并且值“ 9- 39-9”可能会更改。

This what i was try: 这是我尝试过的:

f = open("text2.txt","r+")
d = f.readlines()
f.seek(0)
for line in d:
    if "numeroyhdistelmä" in line:
        f.write(line)
f.truncate()
f.close()

I think letter "ä" is not only problem because i was testing this code some other search word and it delete all lines in a text file. 我认为字母“ä”不仅是问题,因为我正在测试此代码中的其他搜索词,并且删除了文本文件中的所有行。

Thanks! 谢谢!

I would readlines check line, if "word to delete exist" remove line else write to file. 我会读行检查行,如果“要删除的单词存在”,则删除行,否则写入文件。

with open("file") as data:
    lines = data.readlines()

with open("file","w") as f:
    for line in lines:
        if "word to remove" in line:
            continue
        f.write(line,"\n")

Here is how I might solve this problem - also here a question about using the with syntax, which is preferred to use when opening and closing a file: Why is with open() better for opening files in Python? 这是我可能如何解决此问题的方法-这里也是有关使用with语法的问题,在打开和关闭文件时首选使用with语法: 为什么with open()在Python中打开文件更好?

filename = 'text2.txt'
with open(filename, 'r+') as txt_file:
    temp = txt_file.readlines()
    txt_file.seek(0)

    for line in temp:
        if not 'numeroyhdistelm' in line:
            txt_file.write(line)

    txt_file.truncate()

You are now saving only lines with 'numeroyhdistelmä', you should add 'not' to the loop. 现在,您仅使用“numeroyhdistelmä”保存行,应在循环中添加“ not”。 It is also a better practice to use with open() then open() and close() . with open()然后open()close()一起使用也是一种更好的做法。

wordFlag = 'numeroyhdistelmä'
with open("text2.txt","r+") as f:
    lines = f.readlines()

with open("text2.txt","w") as f:
    for line in f:
        if not wordFlag in line:
            f.write('line')
    f.truncate()

You are getting encoding error because test2.txt file is not utf-8 encoded. 您正在获取编码错误,因为test2.txt文件不是utf-8编码的。 If you care about special characters you should decode your file while opening it. 如果您关心特殊字符,则应在打开文件时对其进行解码。 There are encode() and decode() functions avaliable for strings but I prefer to use codecs module. encode()decode()函数avaliable字符串,但我更喜欢使用的编解码器模块。 I am guessing encoding of your file is Latin , but you shoud check it and change the variable if needed. 我猜你的文件的编码是Latin ,但是你应该检查一下并在需要时更改变量。 So then your code would look like: 因此,您的代码将如下所示:

import codecs

encoding = 'Latin'
wordFlag = 'numeroyhdistelmä'
with codecs.open('text2.txt', 'r', encoding) as f:
    lines = f.readlines()

with open('text2.txt','w') as f:
    for line in lines:
        if not wordFlag in line:
            f.write(line)
    f.truncate()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM