简体   繁体   English

databricks-已安装S3-如何获取文件元数据,如上次修改日期(Python)

[英]databricks - mounted S3 - how to get file metadata like last modified date (Python)

I have mounted a s3 bucket in my databricks and I can see the list of files and i can read the files as well using python 我已经在数据砖中安装了s3存储桶,可以看到文件列表,也可以使用python读取文件

ACCESS_KEY = "XXXXXXXXXX"
SECRET_KEY = "XXXXXXXXXXXXXX"
ENCODED_SECRET_KEY = SECRET_KEY.replace("/", "%2F")
AWS_BUCKET_NAME = "testbucket"
MOUNT_NAME = "awsmount1"

dbutils.fs.mount("s3a://%s:%s@%s" % (ACCESS_KEY, ENCODED_SECRET_KEY, AWS_BUCKET_NAME), "/mnt/%s" % MOUNT_NAME)
display(dbutils.fs.ls("/mnt/%s/data" % MOUNT_NAME))

I want to find out the last modified date of the file i am reading, I couldn't find much but the java option Databricks read Azure blob last modified date for azure blob, is there a python native option in databricks to read the file metadata. 我想找出我正在读取的文件的上次修改日期,但找不到太多,但是java选项Databricks读取了Azure blob的Azure blob的上次修改日期 ,databricks中是否有python native选项来读取文件元数据。

If i understand correctly, you need the last modified date for mounted file in Azure data bricks using python native sdk. 如果我理解正确,则需要使用python native sdk在Azure数据块中装入文件的最后修改日期。

Here is the sample code to get the metadata information from Azure blob: 这是从Azure blob获取元数据信息的示例代码:

from azure.storage.blob import BlockBlobService
block_blob_service = BlockBlobService(account_name='accoutName', account_key='accountKey')
container_name ='containerName'
block_blob_service.create_container(container_name)
generator = block_blob_service.list_blobs(container_name)
for blob in generator:
    lastModified= BlockBlobService.get_blob_properties(block_blob_service,container_name,blob.name).properties.last_modified
    print("\t Blob name: " + blob.name)
    print(lastModified)

you can get more details on this here . 您可以在此处获得更多详细信息。

If you are looking fro S3 then i would suggest you to use Boto.oto3 returns a datetime object for LastModified when you use the the (S3) Object python object: 如果您正在寻找S3,那么我建议您使用Boto.oto3在使用(S3)Object python对象时为LastModified返回一个datetime对象:

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Object.last_modified

To compare LastModified to today's date (Python3): 要将LastModified与今天的日期进行比较(Python3):

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='eu-west-1')

objects = s3.list_objects(Bucket='my_bucket')

for o in objects["Contents"]:
    if o["LastModified"] == today:
        print(o["Key"])

Reference 参考

Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Boto Python从S3获取最新文件的最后修改日期? - How to get last modified date of latest file from S3 with Boto Python? Boto + Python + AWS S3:如何获取特定文件的last_modified属性? - Boto + Python + AWS S3: How to get last_modified attribute of specific file? Python-如何获取公共Amazon S3文件的元数据字典? - Python - How to get the metadata dictionary of a public Amazon S3 file? GAE Python:如何获取静态文件的最后修改日期 - GAE Python: how get last modified date of static file 在 S3 中查找特定文件的最后修改日期 - find last modified date of a particular file in S3 如何使用 Boto3 按上次修改日期过滤 s3 对象 - How to filter s3 objects by last modified date with Boto3 如何提取存储桶 S3 的最后修改日期 - How to extract last modified date of bucket S3 如何在Python(Boto lib)中像在Openstack Swift中那样获取Amazon S3存储桶的元数据/标头? - How to get metadata/headers of an Amazon S3 Bucket in Python (Boto lib) like in Openstack Swift? 使用 Python boto 从 S3 获取文件元数据 - get file metadata from S3 using Python boto databricks dbfs 是否支持文件元数据,例如文件/文件夹创建日期或修改日期 - Does databricks dbfs support file metadata such as file/folder create date or modified date
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM