简体   繁体   中英

How to loop nested directories in DataBricks (Python)Pyspark faster and efficient way

How to loop nested directories in DataBricks using Python faster and efficient way

I am trying to loop nested Dir that created by EventHub process and move to single folder.

Here is what I tried and want to know if there is any better way of doing it

    Folder structure:  Part/year/month/date/hour/min
    Ex:  01/2022/01/01/00/00
         01/2022/01/01/00/01
         01/2022/01/01/00/02
         01/2022/01/01/00/00
    
Tried like this:

    loop1=dbutils.fs.ls(folder)
    for one in loop1:
      loop2=dbutils.fs.ls(str(one.path).replace('dbfs:',''))
      for two in loop2:
        loop3=dbutils.fs.ls(str(two.path).replace('dbfs:',''))
        for three in loop3:
          loop4=dbutils.fs.ls(str(three.path).replace('dbfs:',''))
          for four in loop4:
            loop5=dbutils.fs.ls(str(four.path).replace('dbfs:',''))
            for five in loop5:
              loop6=dbutils.fs.ls(str(five.path).replace('dbfs:',''))
              for six in loop6:
                loop7=dbutils.fs.ls(str(six.path).replace('dbfs:',''))
                for seven in loop7:
                  source_file=str(seven.path).replace('dbfs:','')
                  dbutils.fs.mv(source_file,dest,True)
                  print(source_file)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM