简体   繁体   中英

Does databricks dbfs support file metadata such as file/folder create date or modified date

I'm attempting to crawl through a directory in a databricks notebook to find the latest parquet file. dbfsutils.fs.ls does not appear to support any metadata about files or folders. Are there any alternative methods in python to do this? The data is stored in an azure data lake mounted to the DBFS under "/mnt/foo". Any help or pointers is appreciated.

On Azure Databricks as I known, the dbfs path dbfs:/mnt/foo is same as the Linux path /dbfs/mnt/foo , so you can simply use os.stat(path) in Python to get the file metadata like create date or modified date.

在此处输入图片说明

Here is my sample code.

import os
from datetime import datetime
path = '/dbfs/mnt/test'
fdpaths = [path+"/"+fd for fd in os.listdir(path)]
for fdpath in fdpaths:
    statinfo = os.stat(fdpath)
    create_date = datetime.fromtimestamp(statinfo.st_ctime)
    modified_date = datetime.fromtimestamp(statinfo.st_mtime)
    print("The statinfo of path %s is %s, \n\twhich create date and modified date are %s and %s" % (fdpath, statinfo, create_date, modified_date))

And the result is as the figure below.

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM