简体   繁体   中英

Using Python how to get list of all files in a HDFS folder?

I would like to return a listing of all files in a HDFS folder using Python or preferably Pandas in a data frame. I have looked at subprocess.Popen and that may be the best way but if so is there a way to parse out all the noise and only return the file names?

the hdfs module is out as can't get the config options. Tried subprocess.Popen but it returns so much extranious stuff.

Once you've named the path

from pathlib import Path

folder = Path("/tmp/favorite_folder/")

then it's just a matter of globbing some pattern, like folder.glob("*.csv") . Use wildcard to get all names at single level:

print(folder.glob("*"))

To recurse through all levels, you might wish to rely on os.walk() .

https://docs.python.org/3/library/os.html#os.walk

Or, use a recursive glob pattern: folder.glob("**/*.csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM