如何在每个单词出现之前写出文本文件的名称？

Question

How can i write the text file name in each word frequency so that it first shows the fileno and then frequency of word in that file. 我如何在每个单词频率中写入文本文件名，以便它首先显示fileno，然后显示该文件中单词的频率。 for example: { like:['file1',2,'file2,'4'] } here like is the word that both file contains, i want to write file1 and file2 before their frequencies. 例如：{like：['file1'，2，'file2，'4']}这里是两个文件都包含的单词，我想在它们的频率之前写入file1和file2。 It should be general for any number of files. 对于任何数量的文件，它应该是通用的。

Here is my code 这是我的代码

file_list = [open(file, 'r') for file in files] 
    num_files = len(file_list) 
    wordFreq = {}  
    for i, f in enumerate(file_list): 
        for line in f: 
            for word in line.lower().split():
                if not word in wordFreq:
                    wordFreq[word] = [0 for _ in range(num_files)]
                wordFreq[word][i] += 1

Answer 1

I know that my code is not pretty and not exactly what you want, but it is a solution. 我知道我的代码不是很漂亮，也不完全是您想要的，但这是一个解决方案。 I would prefer using dictionary instead of a list structure like ['file1',2,'file2,'4'] 我更喜欢使用字典而不是像['file1',2,'file2,'4']这样的列表结构

Let's define 2 files as an example: 让我们定义2个文件作为示例：

file1.txt: FILE1.TXT：

this is an example

file2.txt: FILE2.TXT：

this is an example
but multi line example

Here is the solution: 解决方法如下：

from collections import Counter

filenames = ["file1.txt", "file2.txt"]

# First, find word frequencies in files
file_dict = {}
for filename in filenames:
    with open(filename) as f:
        text = f.read()
    words = text.split()

    cnt = Counter()
    for word in words:
        cnt[word] += 1
    file_dict[filename] = dict(cnt)

print("file_dict: ", file_dict)

#Then, calculate frequencies in files for each word 
word_dict = {}
for filename, words in file_dict.items():
    for word, count in words.items():
        if word not in word_dict.keys():
            word_dict[word] = {filename: count}
        else:
            if filename not in word_dict[word].keys():
                word_dict[word][filename] = count    
            else:
                word_dict[word][filename] += count


print("word_dict: ", word_dict)

Output: 输出：

file_dict:  {'file1.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 1}, 'file2.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 2, 'but': 1, 'multi': 1, 'line': 1}}
word_dict:  {'this': {'file1.txt': 1, 'file2.txt': 1}, 'is': {'file1.txt': 1, 'file2.txt': 1}, 'an': {'file1.txt': 1, 'file2.txt': 1}, 'example': {'file1.txt': 1, 'file2.txt': 2}, 'but': {'file2.txt': 1}, 'multi': {'file2.txt': 1}, 'line': {'file2.txt': 1}}

Answer 2

This is a good use case for collections.Counter ; 这是collections.Counter好用例； I suggest making a counter for each file. 我建议为每个文件做一个计数器。

from collections import Counter

def make_counter(filename):
    cnt = Counter()

    with open(filename) as f:
        for line in f:                # read line by line, is more performant for big files
            cnt.update(line.split())  # split line by whitespaces and updated word counts

    print(filename, cnt)
    return cnt

This function can be used for each file, making a dict that holds all the counters: 该函数可用于每个文件，从而形成一个包含所有计数器的dict ：

filename_list = ['f1.txt', 'f2.txt', 'f3.txt']
counter_dict = {                      # this will hold a counter for each file
    fn: make_counter(fn)
    for fn in filename_list}

Now a set can be used to get all the different words that appear in the files: 现在，可以使用一个set来获取出现在文件中的所有不同单词：

all_words = set(                      # this will hold all different words that appear
    word                              # in any of the files
    for cnt in counter_dict.values()
    for word in cnt.keys())

And these lines print each word and the count that word has in each file: 这些行将打印每个单词以及每个文件中单词的计数：

for word in sorted(all_words):
    print(word)
    for fn in filename_list:
        print('  {}: {}'.format(fn, counter_dict[fn][word]))

Obviously, you can adjust the printing to your specific needs, but this approach should allow you the flexibility you need. 显然，您可以根据自己的特定需求调整打印，但是这种方法应该可以为您提供所需的灵活性。

If you rather have one dict with all the words as keys and their counts as values, you could try something like this: 如果您宁愿有一个dict ，所有的单词都作为键，而它们的数量则作为值，则可以尝试如下操作：

all_words = {}

for fn, cnt in counter_dict.items():
    for word, n in cnt.items():
        all_words.setdefault(word, {}).setdefault(fn, 0)
        all_words[word][fn] += 0

如何在每个单词出现之前写出文本文件的名称？

问题描述

2 个解决方案

解决方案1
1 2018-11-15 21:41:43

解决方案2
0 2018-11-15 22:16:03

如何在每个单词出现之前写出文本文件的名称？

问题描述

2 个解决方案

解决方案1 1 2018-11-15 21:41:43

解决方案2 0 2018-11-15 22:16:03

解决方案1
1 2018-11-15 21:41:43

解决方案2
0 2018-11-15 22:16:03