简体   繁体   English

如何更快更有效地循环 DataBricks (Python)Pyspark 中的嵌套目录

[英]How to loop nested directories in DataBricks (Python)Pyspark faster and efficient way

How to loop nested directories in DataBricks using Python faster and efficient way如何使用 Python 更快更有效地循环 DataBricks 中的嵌套目录

I am trying to loop nested Dir that created by EventHub process and move to single folder.我正在尝试循环由 EventHub 进程创建的嵌套目录并移动到单个文件夹。

Here is what I tried and want to know if there is any better way of doing it这是我尝试过的,想知道是否有更好的方法

    Folder structure:  Part/year/month/date/hour/min
    Ex:  01/2022/01/01/00/00
         01/2022/01/01/00/01
         01/2022/01/01/00/02
         01/2022/01/01/00/00
    
Tried like this:

    loop1=dbutils.fs.ls(folder)
    for one in loop1:
      loop2=dbutils.fs.ls(str(one.path).replace('dbfs:',''))
      for two in loop2:
        loop3=dbutils.fs.ls(str(two.path).replace('dbfs:',''))
        for three in loop3:
          loop4=dbutils.fs.ls(str(three.path).replace('dbfs:',''))
          for four in loop4:
            loop5=dbutils.fs.ls(str(four.path).replace('dbfs:',''))
            for five in loop5:
              loop6=dbutils.fs.ls(str(five.path).replace('dbfs:',''))
              for six in loop6:
                loop7=dbutils.fs.ls(str(six.path).replace('dbfs:',''))
                for seven in loop7:
                  source_file=str(seven.path).replace('dbfs:','')
                  dbutils.fs.mv(source_file,dest,True)
                  print(source_file)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Databricks 在 Apache Spark 上编译 PySpark 中的 While 循环语句 - How to Compile a While Loop statement in PySpark on Apache Spark with Databricks 如何在本地模拟和测试 Databricks Pyspark 笔记本 - How to mock and test Databricks Pyspark notebooks Locally 如何在 azure 数据块中平均拆分 pyspark dataframe 和 300 万条记录 - How to split pyspark dataframe with 3 million records equally in azure databricks Databricks python/pyspark 代码,用于查找 azure 容器中 blob 的年龄 - Databricks python/pyspark code to find the age of the blob in azure container 如何配置 Databricks 显示 function 以可视化具有原始特征名称的 Pyspark 决策树? - How can I configure Databricks display function to visualize Pyspark Decision tree with original feature names? 使用 Databricks 将数据写入 Bigquery 时出错 Pyspark - Error writing data to Bigquery using Databricks Pyspark 从数据块中的 sql 服务器读取数据(pyspark) - Reading data from sql server in databricks(pyspark) Databricks Python Notebook中如何输入googleapiclient授权码 - How to input googleapiclient authorization code in Databricks Python Notebook 如何在 gitlab-ci 中循环目录 - How to loop directories in gitlab-ci Python Databricks:有没有办法读取保存在 blob 存储中的 tar.gz 文件夹中的文本文件? - Python Databricks: Is there any way to read text files inside a tar.gz folder saved in a blob storage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM