简体   繁体   中英

How can I get two txt files by finding common occurrences?

I need to know which English words were used in the Italian chat and to count how many times they were used.

But in the output I also have the words I didn't use in the example chat (baby-blue-eyes': 0)

english_words = {}

with open("dizionarioen.txt") as f:
for line in f:
  for word in line.strip().split():
    english_words[word] = 0
    
with open("_chat.txt") as f:
for line in f:
  for word in line.strip().split():
    if word in english_words: 
      english_words[word] += 1

print(english_words)

You can simply iterate over your result and remove all elements that have value 0:

english_words = {}

with open("dizionarioen.txt") as f:
  for line in f:
    for word in line.strip().split():
      english_words[word] = 0

with open("_chat.txt") as f:
  for line in f:
    for word in line.strip().split():
      if word in english_words: 
        english_words[word] += 1

result = {key: value for key, value in english_words.items() if value}
print(result)

Also here is another solution that allows you to count words with usage of Counter:

from collections import Counter

with open("dizionarioen.txt") as f:
    all_words = set(word for line in f for word in line.split())

with open("_chat.txt") as f:
    result = Counter([word for line in f for word in line.split() if word in all_words])

print(result)

If you want to remove the words without occurrence after indexing, just delete these entries:

for w in list(english_words.keys()):
    if english_words[w]==0: del english_words[w]

Then, your dictionary only contains words that occurred. Was that the question?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM