[英]databricks - mounted S3 - how to get file metadata like last modified date (Python)
I have mounted a s3 bucket in my databricks and I can see the list of files and i can read the files as well using python 我已经在数据砖中安装了s3存储桶,可以看到文件列表,也可以使用python读取文件
ACCESS_KEY = "XXXXXXXXXX"
SECRET_KEY = "XXXXXXXXXXXXXX"
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "testbucket"
MOUNT_NAME = "awsmount1"
dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s/data" % MOUNT_NAME))
I want to find out the last modified date of the file i am reading, I couldn't find much but the java option Databricks read Azure blob last modified date for azure blob, is there a python native option in databricks to read the file metadata. 我想找出我正在读取的文件的上次修改日期,但找不到太多,但是java选项Databricks读取了Azure blob的Azure blob的上次修改日期 ,databricks中是否有python native选项来读取文件元数据。
If i understand correctly, you need the last modified date for mounted file in Azure data bricks using python native sdk. 如果我理解正确,则需要使用python native sdk在Azure数据块中装入文件的最后修改日期。
Here is the sample code to get the metadata information from Azure blob: 这是从Azure blob获取元数据信息的示例代码:
from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='accoutName', account_key='accountKey')
container_name ='containerName'
block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
lastModified= BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
print("\t Blob name: " + blob.name)
print(lastModified)
you can get more details on this here . 您可以在此处获得更多详细信息。
If you are looking fro S3 then i would suggest you to use Boto.oto3 returns a datetime object for LastModified when you use the the (S3) Object python object: 如果您正在寻找S3,那么我建议您使用Boto.oto3在使用(S3)Object python对象时为LastModified返回一个datetime对象:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.last_modified
To compare LastModified to today's date (Python3): 要将LastModified与今天的日期进行比较(Python3):
import boto3
from datetime import datetime, timezone
today = datetime.now(timezone.utc)
s3 = boto3.client('s3', region_name='eu-west-1')
objects = s3.list_objects(Bucket='my_bucket')
for o in objects["Contents"]:
if o["LastModified"] == today:
print(o["Key"])
Hope it helps. 希望能帮助到你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.