[英]How to get all type of file extensions within a directory using os.walk or glob.glob
I have a code that detects language of files from a directory. 我有一个从目录中检测文件语言的代码。 But while mentioning about the type of extension how can I detect language of all file extensions (example:- .pdf, .xlsx, .docx etc etc) in the directory and not only .txt files which is mentioned in the code.
但是,在提到扩展名类型时,我如何才能检测目录中所有文件扩展名(例如:.pdf,.xlsx,.docx等)的语言,而不仅是代码中提到的.txt文件。 Attaching code for reference.
附加代码以供参考。 I would like to know how this can be done using glob and os.walk.
我想知道如何使用glob和os.walk做到这一点。
import csv
from fnmatch import fnmatch
try:
from langdetect import detect
except ImportError:
detect = lambda _: '<dunno>'
import os
rootdir = '.' # current directory
extension = '.txt'
file_pattern = '*' + extension
with open('output.csv', 'w', newline='', encoding='utf-8') as outfile:
csvwriter = csv.writer(outfile)
for dirpath, subdirs, filenames in os.walk(os.path.abspath(rootdir)):
for filename in filenames:
if fnmatch(filename, file_pattern):
lang = detect(os.path.join(dirpath, filename))
csvwriter.writerow([dirpath, filename, lang])
IIUC you could replace your fnmatch
check by IIUC,您可以将
fnmatch
检查替换为
eoi = ['*.pdf', '*.xlsx', '*.docx', '*.txt'] # extensions of interest list
if any(fnmatch(file, ext) for ext in eoi):
lang = ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.