[英]How to loop nested directories in DataBricks (Python)Pyspark faster and efficient way
How to loop nested directories in DataBricks using Python faster and efficient way如何使用 Python 更快更有效地循环 DataBricks 中的嵌套目录
I am trying to loop nested Dir that created by EventHub process and move to single folder.我正在尝试循环由 EventHub 进程创建的嵌套目录并移动到单个文件夹。
Here is what I tried and want to know if there is any better way of doing it这是我尝试过的,想知道是否有更好的方法
Folder structure: Part/year/month/date/hour/min
Ex: 01/2022/01/01/00/00
01/2022/01/01/00/01
01/2022/01/01/00/02
01/2022/01/01/00/00
Tried like this:
loop1=dbutils.fs.ls(folder)
for one in loop1:
loop2=dbutils.fs.ls(str(one.path).replace('dbfs:',''))
for two in loop2:
loop3=dbutils.fs.ls(str(two.path).replace('dbfs:',''))
for three in loop3:
loop4=dbutils.fs.ls(str(three.path).replace('dbfs:',''))
for four in loop4:
loop5=dbutils.fs.ls(str(four.path).replace('dbfs:',''))
for five in loop5:
loop6=dbutils.fs.ls(str(five.path).replace('dbfs:',''))
for six in loop6:
loop7=dbutils.fs.ls(str(six.path).replace('dbfs:',''))
for seven in loop7:
source_file=str(seven.path).replace('dbfs:','')
dbutils.fs.mv(source_file,dest,True)
print(source_file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.