简体   繁体   English

Python-从子目录中未找到的目录文件中读取文件

[英]Python - reading files from directory file not found in subdirectory (which is there)

I am convinced it is something simply syntactic - I however can not figure out why my code: 我坚信这只是一种语法-但是我不知道为什么我的代码:

import os
from collections import Counter
d = {}
for filename in os.listdir('testfilefolder'):
    f = open(filename,'r')
    d = (f.read()).lower()
    freqs = Counter(d)
    print(freqs)

will not work - it apparently can see in to the 'testfilefolder' folder and tell me that the the file is there ie an error message 'file2.txt' is not found. 将无法正常工作-它显然可以进入“ testfilefolder”文件夹,并告诉我该文件在那里,即未找到错误消息“ file2.txt”。 So it can find it to tell me that it is not found... 所以它可以找到它告诉我找不到它...

I however get this piece of code to work: 但是,我得到了这段代码:

from collections import Counter
d = {}
f = open("testfilefolder/file2.txt",'r')
d = (f.read()).lower()
freqs = Counter(d)
print(freqs)

Bonus - is this a good way of doing what I am trying to do (read from file and count the frequencies of words)? 奖金-这是做我想做的事情的好方法吗(从文件中读取并计算单词的出现频率)? This is my first day with Python (although I have some amounts of programming exp.) 这是我使用Python的第一天(尽管我有很多编程经验。)

I have to say that I am liking Python! 我必须说我喜欢Python!

Thanks, 谢谢,

Brian 布赖恩

Change: 更改:

f = open(filename,'r')

To: 至:

f = open(os.path.join('testfilefolder',filename),'r')

Which is effectively what you are doing in: 实际上,这是您在做什么:

f = open("testfilefolder/file2.txt",'r')

Reason: you are listing the files in 'testfilefolder' (a subdirectory of your current directory) but then trying to open the file in your current directory. 原因:您正在“ testfilefolder”(当前目录的子目录)中列出文件,但随后尝试在当前目录中打开文件。

As isedev pointed out, listdir() returns just the file names, not the full path (or relative paths). 正如isedev指出的那样,listdir()仅返回文件名,而不返回完整路径(或相对路径)。 Another way to deal with this problem is to os.chdir() into the directory in question, then os.listdir('.') . 解决此问题的另一种方法是将os.chdir()放入相关目录,然后是os.listdir('.')

Secondly, it seems your goal is to count frequency of words, not letters (characters). 其次,您的目标似乎是计算单词的频率,而不是字母(字符)的频率。 For that, you will need to break up the contents of the files into words. 为此,您需要将文件的内容分解为单词。 I prefer to use regular expression for this. 我更喜欢为此使用正则表达式。

Thirdly, your solution counts words frequencies for each files separately. 第三,您的解决方案分别计算每个文件的单词频率。 If you ever need to do it for all files, create a Counter() object in the beginning, then call the update() method to tally the counts. 如果您需要对所有文件执行此操作,请在开头创建一个Counter()对象,然后调用update()方法来计算计数。

Without further ado, my solution: 事不宜迟,我的解决方案是:

import collections
import re
import os

all_files_frequency = collections.Counter()

previous_dir = os.getcwd()
os.chdir('testfilefolder')
for filename in os.listdir('.'):
    with open(filename) as f:
        file_contents = f.read().lower()

    words = re.findall(r"[a-zA-Z0-9']+", file_contents) # Breaks up into words
    frequency = collections.Counter(words)              # For this file only
    all_files_frequency.update(words)                   # For all files
    print(frequency)

os.chdir(previous_dir)

print ''
print all_files_frequency

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM