如何使用 python 从每个文件中获取正确信息

Question

My code below works, where it recursively map the paths and scans all files outputting their information such as file size, last access, last modified and when it was created.我下面的代码有效，它递归地 map 路径并扫描所有文件输出其信息，例如文件大小、上次访问、上次修改和创建时间。 However, the information is incorrect where it doesn't output what is displayed for every specific files from their properties.但是，如果不是 output 为每个特定文件从其属性中显示的内容，则该信息是不正确的。 Is there a way to retrieve the correct information from the files?有没有办法从文件中检索正确的信息？

Below is my code:下面是我的代码：

import os
import pandas as pd
import time

pd.set_option('display.max_rows', 3000)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 3000)

pop={}

output = 'C:\\Users\\'
starting_dir='X:\\'
print(starting_dir)

for root, dirs, files in os.walk(starting_dir):
    print(root)
    with os.scandir(root) as i:
        for entry in i:
            file_size = round((os.path.getsize(entry) / 1048576), 4)
            print(file_size)
            access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(entry)))
            print(access_time)
            modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(entry)))
            print(modify_time)
            created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(entry)))
            print(created_time)
            pop[root]= 'Directory|' + str(file_size) + '|'+ str(access_time)+'|'+ str(modify_time) +'|'+ str(created_time)
        for name in files:
            da_file=os.path.join(root,name)
            pop[da_file]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)

print('Scan Complete!')

dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

dfr['file_name'] = dfr['combo'].str.split('|').str[0]
dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
dfr['last_access']= (dfr['combo'].str.split('|').str[2])
dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
dfr['created'] = (dfr['combo'].str.split('|').str[4])

dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

print('Output Ready!')

Answer 1

You are not visiting every file correctly.您没有正确访问每个文件。 In the second block of your code you are using the values that were set in the first block.在代码的第二个块中，您使用的是在第一个块中设置的值。 So these values match the last file that was visited via scandir, not the file you are currently making an entry for, which is the file from the walk.因此，这些值与通过 scandir 访问的最后一个文件匹配，而不是您当前正在为其创建条目的文件，后者是 walk 中的文件。

I don't think you need to use scandir at all, just the walk, perhaps like this:我认为您根本不需要使用scandir，只需步行即可，也许像这样：

import os
import pandas as pd
import time

def make_entry(pop,root,name):
    full_name = os.path.join(root,name)
    file_size = round((os.path.getsize(full_name) / 1048576), 4)
    access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(full_name)))
    modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(full_name)))
    created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(full_name)))
    pop[full_name]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)
    print(pop[full_name])


def main():
    pd.set_option('display.max_rows', 3000)
    pd.set_option('display.max_columns', 10)
    pd.set_option('display.width', 3000)

    pop={}

    output = 'C:\\Users\\'
    starting_dir='X:\\'
    print(starting_dir)

    for root, dirs, files in os.walk(starting_dir):
        for dir in dirs:
            make_entry(pop,root,dir)
        for name in files:
            make_entry(pop,root,name)

    print('Scan Complete!')

    dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

    dfr['file_name'] = dfr['combo'].str.split('|').str[0]
    dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
    dfr['last_access']= (dfr['combo'].str.split('|').str[2])
    dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
    dfr['created'] = (dfr['combo'].str.split('|').str[4])

    dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

    print('Output Ready!')

if __name__=="__main__":
    main()

如何使用 python 从每个文件中获取正确信息

问题描述

1 个解决方案

解决方案1
0 2022-01-31 16:19:07

如何使用 python 从每个文件中获取正确信息

问题描述

1 个解决方案

解决方案1 0 2022-01-31 16:19:07

解决方案1
0 2022-01-31 16:19:07