[英]Given a hdfs path, how do I know if it is a folder or a file with python
I am not checking local file, I want to find out for a given string - whether it is a folder or a file on HDFS, in python.我不是在检查本地文件,我想找出给定的字符串 - 无论它是 HDFS 上的文件夹还是文件,在 python 中。
For example, a string could be like:例如,一个字符串可能是这样的:
hdfs://nameservice1/client/tdb_histscen_2/part-00001
hdfs://nameservice1/client/tdb_histscen_2/part-00001
It could be a file, or a folder that contains folder(s) and/or file(s)它可能是一个文件,也可能是一个包含文件夹和/或文件的文件夹
Thank you very much.非常感谢。
Updated 20181105 as per suggestion from Jim Todd below:根据以下 Jim Todd 的建议更新了 20181105:
hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/ doesn't exist at all hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/ 根本不存在
hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2 is a folder hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2 是一个文件夹
As you can see below, the -test returns same result for them, what am I missing here?正如您在下面看到的,-test 为他们返回相同的结果,我在这里遗漏了什么?
Thank you.谢谢你。
[rxie@cedgedev03 code]$ hdfs dfs -test -e hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/
[rxie@cedgedev03 code]$ hdfs dfs -test -e hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2/
[rxie@cedgedev03 code]$ hdfs dfs -test -d hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2/
[rxie@cedgedev03 code]$ hdfs dfs -test -d hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/
There are several libraries to work with Hadoop in Python.有几个库可以在 Python 中使用 Hadoop。 For instance, if you use
Pydoop
, you can use pydoop.hdfs.path.isfile
method.例如,如果您使用
Pydoop
,则可以使用pydoop.hdfs.path.isfile
方法。
You can check out their documentation你可以查看他们的文档
If you intention is to check if the URI is a directory or not using python, you can check alternatively like below:如果您打算检查 URI 是否是目录或不使用 python,您可以选择如下检查:
import subprocess
location='hdfs://nameservice1/client/tdb_histscen_2/part-00001'
filexistchk="hdfs dfs -test -e "+location+";echo $?"
#echo $? will print the exit code of previously execited command
filexistchk_output=subprocess.Popen(filexistchk,shell=True,stdout=subprocess.PIPE).communicate()
filechk="hdfs dfs -test -d "+location+";echo $?"
filechk_output=subprocess.Popen(filechk,shell=True,stdout=subprocess.PIPE).communicate()
#Check if location exists
if '1' not in str(filexistchk_output[0]):
#check if its a directory
if '1' not in str(filechk_output[0]):
print('The given URI is a directory: '+location)
else:
print('The given URI is a file: '+location)
else:
print(location+ " does not exist. Please check the URI")
About the command: hdfs dfs -test -[ezd] URI关于命令: hdfs dfs -test -[ezd] URI
Options: The -e option will check to see if the file exists, returning 0 if true.选项: -e 选项将检查文件是否存在,如果为真则返回 0。 The -z option will check to see if the file is zero length, returning 0 if true.
-z 选项将检查文件的长度是否为零,如果为真则返回 0。 The -d option will check to see if the path is directory, returning 0 if true.
-d 选项将检查路径是否为目录,如果为真则返回 0。 Example: hdfs dfs -test -d $yourdir
示例:hdfs dfs -test -d $yourdir
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.