繁体   English   中英

完全删除文本文件中的重复单词

[英]Completely deleting duplicates words in a text file

我在文本文件中有一些词,例如:

joynal
abedin
rahim
mohammad
joynal
abedin 
mohammad
kudds

我想删除重复的名称。 它将从文本文件中完全删除这些重复条目

output 应该是这样的:

rahim 
kuddus

我尝试了一些编码,但它只给我重复的值,例如1.joynal2.abedin

编辑:这是我试过的代码:

content = open('file.txt' , 'r').readlines()
content_set = set(content)
cleandata = open('data.txt' , 'w')

for line in content_set:
    cleandata.write(line)

使用计数器:

from collections import Counter 

with open(fn) as f:
    cntr=Counter(w.strip() for w in f)

然后只打印计数为 1 的单词:

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
rahim
kudds

或者用 dict 作为计数器以“旧时尚方式”来做:

cntr={}
with open(fn) as f:
    for line in f:
        k=line.strip()
        cntr[k]=cntr.get(k, 0)+1

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
# same

如果你想 output 到一个新文件:

with open(new_file, 'w') as f_out:
    f_out.write('\n'.join(w for w,cnt in cntr.items() if cnt==1))
file = open("yourFile.txt")    # open file
text = file.read()             # returns content of the file
file.close()

wordList = text.split()        # creates list of every word 
wordList = list(dict.fromkeys(wordList))    # removes duplicate elements

str = ""
for word in wordList:     
    str += word
    str += " "           # creates a string that contains every word

file = open("yourFile.txt", "w")

file.write(str)          # writes the new string in the file
file.close()

        
    

为了完整起见,如果您不关心顺序:

with open(fn) as f:
    words = set(x.strip() for x in f)

with open(new_fn, "w") as f:
    f.write("\n".join(words))

其中fn是您要读取的文件, new_fn是您要写入的文件。

总的来说,对于唯一性的思考set ——记住顺序是没有保证的。

您可以只创建一个列表,如果名称不在则追加,如果名称在并且第二次出现则删除。

with open("file1.txt", "r") as f, open("output_file.txt", "w") as g:
    output_list = []
    for line in f:
        word = line.strip()
        if not word in output_list:
            output_list.append(word)
        else:
            output_list.remove(word)
    
    g.write("\n".join(output_list))

print(output_list)

['rahim', 'kudds']

#in the text it is for each row one name like this:

rahim
kudds

带计数器的解决方案在我看来仍然是更优雅的方式

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM