简体   繁体   English

如何使用 python 从每个文件中获取正确信息

[英]How to get the correct information from every file using python

My code below works, where it recursively map the paths and scans all files outputting their information such as file size, last access, last modified and when it was created.我下面的代码有效,它递归地 map 路径并扫描所有文件输出其信息,例如文件大小、上次访问、上次修改和创建时间。 However, the information is incorrect where it doesn't output what is displayed for every specific files from their properties.但是,如果不是 output 为每个特定文件从其属性中显示的内容,则该信息是不正确的。 Is there a way to retrieve the correct information from the files?有没有办法从文件中检索正确的信息?

Below is my code:下面是我的代码:

import os
import pandas as pd
import time

pd.set_option('display.max_rows', 3000)
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 3000)

pop={}

output = 'C:\\Users\\'
starting_dir='X:\\'
print(starting_dir)

for root, dirs, files in os.walk(starting_dir):
    print(root)
    with os.scandir(root) as i:
        for entry in i:
            file_size = round((os.path.getsize(entry) / 1048576), 4)
            print(file_size)
            access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(entry)))
            print(access_time)
            modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(entry)))
            print(modify_time)
            created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(entry)))
            print(created_time)
            pop[root]= 'Directory|' + str(file_size) + '|'+ str(access_time)+'|'+ str(modify_time) +'|'+ str(created_time)
        for name in files:
            da_file=os.path.join(root,name)
            pop[da_file]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)

print('Scan Complete!')

dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

dfr['file_name'] = dfr['combo'].str.split('|').str[0]
dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
dfr['last_access']= (dfr['combo'].str.split('|').str[2])
dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
dfr['created'] = (dfr['combo'].str.split('|').str[4])

dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

print('Output Ready!')

You are not visiting every file correctly.您没有正确访问每个文件。 In the second block of your code you are using the values that were set in the first block.在代码的第二个块中,您使用的是在第一个块中设置的值。 So these values match the last file that was visited via scandir, not the file you are currently making an entry for, which is the file from the walk.因此,这些值与通过 scandir 访问的最后一个文件匹配,而不是您当前正在为其创建条目的文件,后者是 walk 中的文件。

I don't think you need to use scandir at all, just the walk, perhaps like this:我认为您根本不需要使用scandir,只需步行即可,也许像这样:

import os
import pandas as pd
import time

def make_entry(pop,root,name):
    full_name = os.path.join(root,name)
    file_size = round((os.path.getsize(full_name) / 1048576), 4)
    access_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getatime(full_name)))
    modify_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getmtime(full_name)))
    created_time = time.strftime('%d/%m/%Y', time.gmtime(os.path.getctime(full_name)))
    pop[full_name]= name + '|'+  str(file_size) + '|'+ str(access_time) +'|'+ str(modify_time) +'|'+ str(created_time)
    print(pop[full_name])


def main():
    pd.set_option('display.max_rows', 3000)
    pd.set_option('display.max_columns', 10)
    pd.set_option('display.width', 3000)

    pop={}

    output = 'C:\\Users\\'
    starting_dir='X:\\'
    print(starting_dir)

    for root, dirs, files in os.walk(starting_dir):
        for dir in dirs:
            make_entry(pop,root,dir)
        for name in files:
            make_entry(pop,root,name)

    print('Scan Complete!')

    dfr=pd.DataFrame(pop.items(),columns=['file_location','combo'])

    dfr['file_name'] = dfr['combo'].str.split('|').str[0]
    dfr['file_size'] = (dfr['combo'].str.split('|').str[1]).astype(float)
    dfr['last_access']= (dfr['combo'].str.split('|').str[2])
    dfr['last_modify'] = (dfr['combo'].str.split('|').str[3])
    dfr['created'] = (dfr['combo'].str.split('|').str[4])

    dfr.to_excel(output + 'sharepoint_output.xlsx',index=False)

    print('Output Ready!')

if __name__=="__main__":
    main()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python从netcdf文件的标题中获取特定信息? - How to get a specific information from the header of a netcdf file using python? 如何从Python中的xlsx文件获取信息? - How to get information from an xlsx file in Python? 如何使用 python 从 facebook 获取信息? - How to get information from facebook using python? 从 html 中的 python 文件中获取信息 [使用烧瓶的 web 应用程序] - Get information from python file in html [web app using flask] 如何从使用Python修改的日志文件的最新信息中获取信息? - How do I get Information from log file's last modified using Python? 无法使用 Python 从视频文件中获取正确的 Exif 数据 - Unable to get correct Exif data from video file using Python 如何从 python 中的 DICOM 文件中获取仿射信息? - How to get affine information from a DICOM file in python? 如何使用 python 获取文件每一行的第一个单词? - How to get just the first word of every line of file using python? 如何使用 Python 从需要登录信息的网站下载文件? - How to download file from website that requires login information using Python? 如何使用Python将某些信息从文本文件复制到XML? - How to copy certain information from a text file to XML using Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM