简体   繁体   English

Python:如何移动基于年/月/日格式的结构化文件夹中的文件?

[英]Python: How to move files in a structured folder based on year/month/date format?

Currently I have a spark job that reads the file, creates a dataframe, does some transformations and then move those records in "year/month/date" format.目前我有一个读取文件的 spark 作业,创建一个 dataframe,进行一些转换,然后以“年/月/日”格式移动这些记录。 I am achieving this by:我通过以下方式实现这一目标:

df.write.option("delimiter", "\t").option("header", False).mode(
            "append"
        ).partitionBy("year", "month", "day").option("compression", "gzip").csv(
            config["destination"]
        )

I want to achieve the same by pythonic way.我想通过 pythonic 方式实现相同的目的。 So, in the end it should look like:所以,最后它应该是这样的:

data/2022/04/14
data/2022/04/15

Based on your question, instead of using partitionBy you can also modify your config['destination'] , as s3 will take care of the necessary folder creations underneath the s3 path根据您的问题,除了使用partitionBy ,您还可以修改config['destination'] ,因为 s3 将负责在 s3 路径下创建必要的文件夹

s3_dump_path = config["destination"] ### 's3:/test-path/'
>>> curr_date = datetime.now().date()
>>> year,month,day = curr_date.strftime('%Y'),curr_date.strftime('%m'),curr_date.strftime('%d')
>>> s3_new_path = '/'.join([s3_dump_path,year,month,day])
>>> s3_new_path
's3:/test-path//2022/04/14'
>>> config["destination"] = s3_new_path

df.write.option("delimiter", "\t").option("header", False).mode(
            "append"
        ).option("compression", "gzip").csv(
            config["destination"]
        )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将年、月、日转换为日期? - How to cast year, month, day to date? BigQuery - 与本月至今的年度比较 - BigQuery - Year over Year Comparison with Month to Date BigQuery 中的逐年逐月比较和本月至今 - Year over Year by Month Comparison and Month to Date in BigQuery 我如何在 flutter 中获取日期月份年份值 - how i get date month year values in flutter 如何在大多数查询处于年月级别的日期字段上优化 BigQuery 查询 - How to optimize BigQuery queries on a date field where most queries are at a year-month level 如何根据月份内的日期对 SQL 查询中的周数进行编号 - How to Number Weeks in SQL Query based on Date within Month 大查询——获取一年中每个月的最后日期 - Big query- get last date of every month in a year 如何在脚本 elixir 中将文件夹的所有文件从另一个文件夹移动到同一个 S3 存储桶 - How to move all files of folder from another folder to same S3 bucket in script elixir 按年/月(格式 YYYY-MM)细分的平均订阅量(总订阅量/计数订阅)是多少? - How much is the average subscriptions amount (sum amount subscriptions / count subscriptions) breakdown by year/month (format YYYY-MM)? 如何使用 python 列出 S3 存储桶文件夹中的文件 - how to list files from a S3 bucket folder using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM