简体   繁体   English

如何在每个单词出现之前写出文本文件的名称?

[英]How can i write the name of text file before frequency of each word?

How can i write the text file name in each word frequency so that it first shows the fileno and then frequency of word in that file. 我如何在每个单词频率中写入文本文件名,以便它首先显示fileno,然后显示该文件中单词的频率。 for example: { like:['file1',2,'file2,'4'] } here like is the word that both file contains, i want to write file1 and file2 before their frequencies. 例如:{like:['file1',2,'file2,'4']}这里是两个文件都包含的单词,我想在它们的频率之前写入file1和file2。 It should be general for any number of files. 对于任何数量的文件,它应该是通用的。

Here is my code 这是我的代码

file_list = [open(file, 'r') for file in files] 
    num_files = len(file_list) 
    wordFreq = {}  
    for i, f in enumerate(file_list): 
        for line in f: 
            for word in line.lower().split():
                if not word in wordFreq:
                    wordFreq[word] = [0 for _ in range(num_files)]
                wordFreq[word][i] += 1

I know that my code is not pretty and not exactly what you want, but it is a solution. 我知道我的代码不是很漂亮,也不完全是您想要的,但这是一个解决方案。 I would prefer using dictionary instead of a list structure like ['file1',2,'file2,'4'] 我更喜欢使用字典而不是像['file1',2,'file2,'4']这样的列表结构

Let's define 2 files as an example: 让我们定义2个文件作为示例:

file1.txt: FILE1.TXT:

this is an example

file2.txt: FILE2.TXT:

this is an example
but multi line example

Here is the solution: 解决方法如下:

from collections import Counter

filenames = ["file1.txt", "file2.txt"]

# First, find word frequencies in files
file_dict = {}
for filename in filenames:
    with open(filename) as f:
        text = f.read()
    words = text.split()

    cnt = Counter()
    for word in words:
        cnt[word] += 1
    file_dict[filename] = dict(cnt)

print("file_dict: ", file_dict)

#Then, calculate frequencies in files for each word 
word_dict = {}
for filename, words in file_dict.items():
    for word, count in words.items():
        if word not in word_dict.keys():
            word_dict[word] = {filename: count}
        else:
            if filename not in word_dict[word].keys():
                word_dict[word][filename] = count    
            else:
                word_dict[word][filename] += count


print("word_dict: ", word_dict)

Output: 输出:

file_dict:  {'file1.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 1}, 'file2.txt': {'this': 1, 'is': 1, 'an': 1, 'example': 2, 'but': 1, 'multi': 1, 'line': 1}}
word_dict:  {'this': {'file1.txt': 1, 'file2.txt': 1}, 'is': {'file1.txt': 1, 'file2.txt': 1}, 'an': {'file1.txt': 1, 'file2.txt': 1}, 'example': {'file1.txt': 1, 'file2.txt': 2}, 'but': {'file2.txt': 1}, 'multi': {'file2.txt': 1}, 'line': {'file2.txt': 1}}

This is a good use case for collections.Counter ; 这是collections.Counter好用例; I suggest making a counter for each file. 我建议为每个文件做一个计数器。

from collections import Counter

def make_counter(filename):
    cnt = Counter()

    with open(filename) as f:
        for line in f:                # read line by line, is more performant for big files
            cnt.update(line.split())  # split line by whitespaces and updated word counts

    print(filename, cnt)
    return cnt

This function can be used for each file, making a dict that holds all the counters: 该函数可用于每个文件,从而形成一个包含所有计数器的dict

filename_list = ['f1.txt', 'f2.txt', 'f3.txt']
counter_dict = {                      # this will hold a counter for each file
    fn: make_counter(fn)
    for fn in filename_list}

Now a set can be used to get all the different words that appear in the files: 现在,可以使用一个set来获取出现在文件中的所有不同单词:

all_words = set(                      # this will hold all different words that appear
    word                              # in any of the files
    for cnt in counter_dict.values()
    for word in cnt.keys())

And these lines print each word and the count that word has in each file: 这些行将打印每个单词以及每个文件中单词的计数:

for word in sorted(all_words):
    print(word)
    for fn in filename_list:
        print('  {}: {}'.format(fn, counter_dict[fn][word]))

Obviously, you can adjust the printing to your specific needs, but this approach should allow you the flexibility you need. 显然,您可以根据自己的特定需求调整打印,但是这种方法应该可以为您提供所需的灵活性。


If you rather have one dict with all the words as keys and their counts as values, you could try something like this: 如果您宁愿有一个dict ,所有的单词都作为键,而它们的数量则作为值,则可以尝试如下操作:

all_words = {}

for fn, cnt in counter_dict.items():
    for word, n in cnt.items():
        all_words.setdefault(word, {}).setdefault(fn, 0)
        all_words[word][fn] += 0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Python中,如何从文本文件中写入每个单词都是不同元素的文字? - in Python, how can i write from a text file with each word being a different element? 在文本文件中搜索一行并在其前面写一个单词 - Search a line in text file and write a word before it 计算文本文件中每个单词的频率,使用python将其存储在变量中 - Calculate the frequency for each word in text file, store it in a variable using python 如何将列表的每个元素写入文本文件的每一行? - How can I write each element of a list to each line of a text file? 我可以编写代码来计算每个整数的频率,然后用它的频率打印每个整数吗? - Can I write codes to count the frequency of each integer, and then print each integer with its frequency? 如何在python中编写一个程序来比较给定的单词和文本? - how can I write a program in python that compares a given word with a text? 如何在 Python 中获取/使用计数器值来获取文本文件的词频 - How to get/use Counter Values in Python for word frequency of a text file 如何在文本文件中查找一行中单词的频率 - Pyspark - How to find the frequency of a word in a line, in a text file - Pyspark 如何实现以百分比计算该词在文本中出现的频率的功能 - How can implement the function to count the frequency of that word in the text in percentage 如何编写一个从文本文件读取的Python程序,并构建一个映射每个单词的字典 - how to write a Python program that reads from a text file, and builds a dictionary which maps each word
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM