简体   繁体   English

扫描目录和子目录中的文件

[英]Scanning files in directories and subdirectories

Essentially what I'm looking to do is search the files within a folder structure for a list of invoices that are provided and copy the desired data over to a new file. 本质上,我想做的是在文件夹结构中的文件中搜索提供的发票清单,然后将所需数据复制到新文件中。 My below script works as described, however the script chokes on search folders that contain sub directories. 我的以下脚本按所述方式工作,但是该脚本在包含子目录的搜索文件夹中阻塞。 I need to modify the script to scan the root folder and it's sub directories files. 我需要修改脚本以扫描根文件夹及其子目录文件。 Any idea what to do, I've tried several different code updates but non seem to work: 不知道该怎么做,我尝试了几种不同的代码更新,但似乎不起作用:

import tkinter
import os
import fnmatch
from tkinter import *
from tkinter import messagebox as tkMessageBox
from tkinter.filedialog import askopenfilename
from tkinter.filedialog import askdirectory
from pathlib import PureWindowsPath
from pathlib import Path

#filedialog  

content = ''
BrowsePath = ''
SearchPath = ''

top = tkinter.Tk()

#********************************************************FIELDS****************************************************************************
#Browse entry field
Browse1 = Label(text="Search List:").grid(row=0)

BrowsePath = StringVar()
BrowsePath.set("Select File Containing Invoice Numbers")
BrowseL = Label(bd=5,textvariable=BrowsePath, width=100,relief=SUNKEN).grid(row=0,column=1)



#Search Folder
Searce1 = Label( text="Search Folder:").grid(row=1)

SearchPath = StringVar()
SearchPath.set("Select Folder to Search")
SearchL = Label(bd=5,textvariable=SearchPath, width=100,relief=SUNKEN).grid(row=1,column=1)



#OutputFile
OutputL1 = Label( text="Output File:").grid(row=2)

OutputPath = StringVar()
OutputPath.set("File to Save Results to")
OutputL2 = Label(bd=5,textvariable=OutputPath, width=100,relief=SUNKEN).grid(row=2,column=1)


#********************************************************FUNCTIONS****************************************************************************

#Process complete function
def GetCallBack():
   tkMessageBox.showinfo( "Find Invoices", "Processing complete!")


#********************************************************FILE PICKERS****************************************************************************

    #Select file containing list of invoices
def GetFile():
    global content
    global BrowsePath
    filename = askopenfilename()
    infile = open(filename,'r')
    content = infile.read()
    BrowsePath.set(os.path.realpath(filename))
    return content

    #Select directory containing invoice files
def SearchDir():
    global content
    global SearchPath
    pathname = askdirectory()
    SearchPath.set(os.path.realpath(pathname))
    return content


    #Creates the save file with isolated invoices
def SaveFile():

    filename = os.path.abspath(os.path.join(SearchPath.get(),"Results.txt"))

    OutputPath.set(filename) #update label with location of file




#********************************************************READING invoice LIST FILE****************************************************************************


def  GetPOCount():
    PO = [line.rstrip('\n') for line in open(os.path.realpath(BrowsePath.get()))] #isolates list of invoices
    ponum_count = sum(1 for line in open(os.path.realpath(BrowsePath.get()))) #gets count of invoice numbers
    return PO, ponum_count #can be indexed


def GetFileNames():
    files = os.listdir(SearchPath.get()) #gets list of files
    return files #can be indexed

def GetFileLineCount():
    files = GetFileNames()
    file_count = len(fnmatch.filter(os.listdir(SearchPath.get()),'*.*'))
    line_count = sum(1 for line in open(os.path.realpath(os.path.join(SearchPath.get(),files[file_count-1])))) #gets count of lines in invoice file
    return line_count, file_count

def FindPOs():
    po_number = GetPOCount()[0]
    po_counter = GetPOCount()[1]

    print(po_number)
    print(po_counter)


    file_counter = GetFileLineCount()[1] 
    file_name = GetFileNames()

    print(file_name)
    print(file_counter)


    # For each file
    for filename in file_name:
        print("Searching " + filename)

        with open(os.path.join(SearchPath.get(),filename),'r') as content_file:
            line_count = sum(1 for line in content_file) #gets count of lines in invoice file
            print(line_count)
            po_line = [line.rstrip('\n') for line in open(os.path.realpath(os.path.join(SearchPath.get(),filename)))] #isolates each line
            result_filename = os.path.abspath(os.path.join(os.path.dirname(SearchPath.get()),"Results.txt"))
            OutputPath.set(result_filename)
            log = os.path.abspath(os.path.join(os.path.dirname(SearchPath.get()),"FoundInvoices.txt"))

            # For each line in file
            #TODO: make this for each po_line
            for PONum in po_number:
                print("looking for " + PONum)

                for line in range (0,line_count):

                    #locate Header Record
                    if po_line[line][16:18] == "10" or po_line[line][15:17] == "10":
                        print("On a header record")

                        if PONum in po_line[line].strip():
                            print("Looking for " + PONum)
                            # Write the current line to the results file
                            with open(result_filename,'a+') as file:
                                file.write(po_line[line] + '\n')

                            # Write this PONum to the log file
                            with open(log,'a+') as logs:
                                logs.write(PONum + '\n')

                            # Loop from the next line to the end
                            with open(result_filename,'a+') as file:
                                for z in range (line+1,line_count):
                                    if ((po_line[z][16:18] != "10") and (po_line[z] != '\n') and (po_line[z][15:17] != "10") and (po_line[z][16:18] != "05")):
                                        file.write(po_line[z] + '\n')
                                    else:
                                    # Once we've found a "10" or newline, stop printing this PO
                                        break


    GetCallBack()







#********************************************************BUTTONS****************************************************************************

# Search List Browse Button logic
BrowseButton = tkinter.Button(text ="Browse", command = GetFile).grid(row=0,column = 2)

# Search Directory Button logic
SearchButton = tkinter.Button(text ="Search", command = SearchDir).grid(row=1,column = 2)

# Find POs Button Logic
FindButton = tkinter.Button( text ="Get Invoices", command = FindPOs).grid(row=4,column = 1)


top.mainloop()

You're code is a bit overly complex. 您的代码有点过于复杂。 Perhaps this is only a portion of the total code. 也许这只是全部代码的一部分。 For instance GetFileLineCount() returns two variables, but one of them is never used in your code. 例如,GetFileLineCount()返回两个变量,但是您的代码中从未使用其中一个。 GetFileNames() could produce the same result. GetFileNames()可能产生相同的结果。

def GetFileNames():
    files = os.listdir(SearchPath.get()) #gets list of files
    file_count = len(fnmatch.filter(files),'*.*'))
    return files, file_count #can be indexed

Or better yet you could replace GetFileNames with an os.walk() function: 或者更好的是,您可以用os.walk()函数替换GetFileNames:

def GetFileNames():
    filepaths = []
    for root,dir,files in os.walk(SearchPath.get()):
        if len(files) > 0:
            for file in files:
                filepaths.append(os.path.join(root,file))
    return filepaths

This will give you a list of all of the files in your SearchPath. 这将为您提供SearchPath中所有文件的列表。 Then you can use the same loop but not have to join your SearchPath with your filename each time: 然后,您可以使用相同的循环,而不必每次都使用文件名将SearchPath加入:

for filename in filepaths:
        print("Searching " + filename)

        with open(filename,'r') as content_file:
            line_count = sum(1 for line in content_file)

... and so on. ... 等等。

Note - I haven't rewritten all of your code for you. 注意 -我尚未为您重写所有代码。 You will likely need to make some modifications here and there to make this work, but this should provide a solution to your problem. 您可能需要在此处和此处进行一些修改才能使此工作正常进行,但这应该可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM