简体   繁体   中英

How to get the correct information from every file using python

My code below works, where it recursively map the paths and scans all files outputting their information such as file size, last access, last modified and when it was created. However, the information is incorrect where it doesn't output what is displayed for every specific files from their properties. Is there a way to retrieve the correct information from the files?

Below is my code:

import os
import pandas as pd
import time

pd.set_option('display.max_rows', 3000)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 3000)

pop={}

output = 'C:\\Users\\'
starting_dir='X:\\'
print(starting_dir)

for root, dirs, files in os.walk(starting_dir):
    print(root)
    with os.scandir(root) as i:
        for entry in i:
            file_size = round((os.path.getsize(entry) / 1048576), 4)
            print(file_size)
            access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(entry)))
            print(access_time)
            modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(entry)))
            print(modify_time)
            created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(entry)))
            print(created_time)
            pop[root]= 'Directory|' + str(file_size) + '|'+ str(access_time)+'|'+ str(modify_time) +'|'+ str(created_time)
        for name in files:
            da_file=os.path.join(root,name)
            pop[da_file]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)

print('Scan Complete!')

dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

dfr['file_name'] = dfr['combo'].str.split('|').str[0]
dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
dfr['last_access']= (dfr['combo'].str.split('|').str[2])
dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
dfr['created'] = (dfr['combo'].str.split('|').str[4])

dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

print('Output Ready!')

You are not visiting every file correctly. In the second block of your code you are using the values that were set in the first block. So these values match the last file that was visited via scandir, not the file you are currently making an entry for, which is the file from the walk.

I don't think you need to use scandir at all, just the walk, perhaps like this:

import os
import pandas as pd
import time

def make_entry(pop,root,name):
    full_name = os.path.join(root,name)
    file_size = round((os.path.getsize(full_name) / 1048576), 4)
    access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(full_name)))
    modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(full_name)))
    created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(full_name)))
    pop[full_name]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)
    print(pop[full_name])


def main():
    pd.set_option('display.max_rows', 3000)
    pd.set_option('display.max_columns', 10)
    pd.set_option('display.width', 3000)

    pop={}

    output = 'C:\\Users\\'
    starting_dir='X:\\'
    print(starting_dir)

    for root, dirs, files in os.walk(starting_dir):
        for dir in dirs:
            make_entry(pop,root,dir)
        for name in files:
            make_entry(pop,root,name)

    print('Scan Complete!')

    dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

    dfr['file_name'] = dfr['combo'].str.split('|').str[0]
    dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
    dfr['last_access']= (dfr['combo'].str.split('|').str[2])
    dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
    dfr['created'] = (dfr['combo'].str.split('|').str[4])

    dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

    print('Output Ready!')

if __name__=="__main__":
    main()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM