Python：如何移动基于年/月/日格式的结构化文件夹中的文件？

Question

Currently I have a spark job that reads the file, creates a dataframe, does some transformations and then move those records in "year/month/date" format.目前我有一个读取文件的 spark 作业，创建一个 dataframe，进行一些转换，然后以“年/月/日”格式移动这些记录。 I am achieving this by:我通过以下方式实现这一目标：

df.write.option("delimiter", "\t").option("header", False).mode(
            "append"
        ).partitionBy("year", "month", "day").option("compression", "gzip").csv(
            config["destination"]
        )

I want to achieve the same by pythonic way.我想通过 pythonic 方式实现相同的目的。 So, in the end it should look like:所以，最后它应该是这样的：

data/2022/04/14
data/2022/04/15

Answer 1

Based on your question, instead of using partitionBy you can also modify your config['destination'] , as s3 will take care of the necessary folder creations underneath the s3 path根据您的问题，除了使用partitionBy ，您还可以修改config['destination'] ，因为 s3 将负责在 s3 路径下创建必要的文件夹

s3_dump_path = config["destination"] ### 's3:/test-path/'
>>> curr_date = datetime.now().date()
>>> year,month,day = curr_date.strftime('%Y'),curr_date.strftime('%m'),curr_date.strftime('%d')
>>> s3_new_path = '/'.join([s3_dump_path,year,month,day])
>>> s3_new_path
's3:/test-path//2022/04/14'
>>> config["destination"] = s3_new_path

df.write.option("delimiter", "\t").option("header", False).mode(
            "append"
        ).option("compression", "gzip").csv(
            config["destination"]
        )

Python：如何移动基于年/月/日格式的结构化文件夹中的文件？

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-04-14 06:56:24

Python：如何移动基于年/月/日格式的结构化文件夹中的文件？

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-04-14 06:56:24

解决方案1
1 已采纳 2022-04-14 06:56:24