简体   繁体   中英

databricks - mounted S3 - how to get file metadata like last modified date (Python)

I have mounted a s3 bucket in my databricks and I can see the list of files and i can read the files as well using python

ACCESS_KEY = "XXXXXXXXXX"
SECRET_KEY = "XXXXXXXXXXXXXX"
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "testbucket"
MOUNT_NAME = "awsmount1"

dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s/data" % MOUNT_NAME))

I want to find out the last modified date of the file i am reading, I couldn't find much but the java option Databricks read Azure blob last modified date for azure blob, is there a python native option in databricks to read the file metadata.

If i understand correctly, you need the last modified date for mounted file in Azure data bricks using python native sdk.

Here is the sample code to get the metadata information from Azure blob:

from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='accoutName', account_key='accountKey')
container_name ='containerName'
block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
    lastModified= BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    print("\t Blob name: " + blob.name)
    print(lastModified)

you can get more details on this here .

If you are looking fro S3 then i would suggest you to use Boto.oto3 returns a datetime object for LastModified when you use the the (S3) Object python object:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.last_modified

To compare LastModified to today's date (Python3):

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='eu-west-1')

objects = s3.list_objects(Bucket='my_bucket')

for o in objects["Contents"]:
    if o["LastModified"] == today:
        print(o["Key"])

Reference

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM