[英]Deleting punctuation from a large text file (TASK: counting words, deleting common words, deleting punctuation)
[英]Completely deleting duplicates words in a text file
我在文本文件中有一些词,例如:
joynal
abedin
rahim
mohammad
joynal
abedin
mohammad
kudds
我想删除重复的名称。 它将从文本文件中完全删除这些重复条目
output 应该是这样的:
rahim
kuddus
我尝试了一些编码,但它只给我重复的值,例如1.joynal
和2.abedin
。
编辑:这是我试过的代码:
content = open('file.txt' , 'r').readlines()
content_set = set(content)
cleandata = open('data.txt' , 'w')
for line in content_set:
cleandata.write(line)
使用计数器:
from collections import Counter
with open(fn) as f:
cntr=Counter(w.strip() for w in f)
然后只打印计数为 1 的单词:
>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
rahim
kudds
或者用 dict 作为计数器以“旧时尚方式”来做:
cntr={}
with open(fn) as f:
for line in f:
k=line.strip()
cntr[k]=cntr.get(k, 0)+1
>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
# same
如果你想 output 到一个新文件:
with open(new_file, 'w') as f_out:
f_out.write('\n'.join(w for w,cnt in cntr.items() if cnt==1))
file = open("yourFile.txt") # open file
text = file.read() # returns content of the file
file.close()
wordList = text.split() # creates list of every word
wordList = list(dict.fromkeys(wordList)) # removes duplicate elements
str = ""
for word in wordList:
str += word
str += " " # creates a string that contains every word
file = open("yourFile.txt", "w")
file.write(str) # writes the new string in the file
file.close()
为了完整起见,如果您不关心顺序:
with open(fn) as f:
words = set(x.strip() for x in f)
with open(new_fn, "w") as f:
f.write("\n".join(words))
其中fn
是您要读取的文件, new_fn
是您要写入的文件。
总的来说,对于唯一性的思考set
——记住顺序是没有保证的。
您可以只创建一个列表,如果名称不在则追加,如果名称在并且第二次出现则删除。
with open("file1.txt", "r") as f, open("output_file.txt", "w") as g:
output_list = []
for line in f:
word = line.strip()
if not word in output_list:
output_list.append(word)
else:
output_list.remove(word)
g.write("\n".join(output_list))
print(output_list)
['rahim', 'kudds']
#in the text it is for each row one name like this:
rahim
kudds
带计数器的解决方案在我看来仍然是更优雅的方式
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.