I'm trying to delete empty directories from Azure storage container which mounted to my DBFS
I'm able to list all directories which has no files.
%sh
find /dbfs/mnt/test/logs/2021 -empty -type d
Result:
/dbfs/mnt/test/logs/2021/02/12
/dbfs/mnt/test/logs/2021/02/15
/dbfs/mnt/test/logs/2021/02/16
But when I try to delete them, it is failing with Resource temporary unavailable.
%sh
find /dbfs/mnt/test/logs/ -type d -exec rmdir {} \;
Result:
rmdir: failed to remove '/dbfs/mnt/test/logs/': Directory not empty
rmdir: failed to remove '/dbfs/mnt/test/logs/2021': Directory not empty
rmdir: failed to remove '/dbfs/mnt/test/logs/2021/02': Directory not empty
rmdir: failed to remove '/dbfs/mnt/test/logs/2021/02/12': Resource temporarily unavailable
I'm able to successfully remove files older than certain days.. removing direcotry is not working. (Below command to remove files working
%sh
find /dbfs/mnt/test/logs/ -name "*.log" -type f -mtime +5 -exec rm -f {} \;
First thing to remember - DBFS is an abstraction over the cloud blob storage, where there is no real directories - they are just prefixes that are used to organize data. And if you do %sh ls -ls /dbfs/mnt/test/logs/
you may notice that all directories will have the same timestamp, and it could be the recent one - I don't remember out the head how it's calculated. Only files have the timestamp.
So if you need to reliably remove directories, it's better to use dbutils.fs.rm('/mnt/test/logs/', True)
(in Python, or similar in the Scala) to remove directory recursively (see docs ). But there are limitations, like there is no support for wildcards, etc., so you need to generate a list of directories to delete, and do the deletion.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.