简体   繁体   English

给定一个hdfs路径,我怎么知道它是一个文件夹还是一个带有python的文件

[英]Given a hdfs path, how do I know if it is a folder or a file with python

I am not checking local file, I want to find out for a given string - whether it is a folder or a file on HDFS, in python.我不是在检查本地文件,我想找出给定的字符串 - 无论它是 HDFS 上的文件夹还是文件,在 python 中。

For example, a string could be like:例如,一个字符串可能是这样的:

hdfs://nameservice1/client/tdb_histscen_2/part-00001 hdfs://nameservice1/client/tdb_histscen_2/part-00001

It could be a file, or a folder that contains folder(s) and/or file(s)它可能是一个文件,也可能是一个包含文件夹和/或文件的文件夹

Thank you very much.非常感谢。

Updated 20181105 as per suggestion from Jim Todd below:根据以下 Jim Todd 的建议更新了 20181105:

hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/ doesn't exist at all hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/ 根本不存在

hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2 is a folder hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2 是一个文件夹

As you can see below, the -test returns same result for them, what am I missing here?正如您在下面看到的,-test 为他们返回相同的结果,我在这里遗漏了什么?

Thank you.谢谢你。

[rxie@cedgedev03 code]$ hdfs dfs -test -e hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/
[rxie@cedgedev03 code]$ hdfs dfs -test -e hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2/
[rxie@cedgedev03 code]$ hdfs dfs -test -d hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_2/
[rxie@cedgedev03 code]$ hdfs dfs -test -d hdfs://nameservice1/client/nova/scenarios/warehouse/pricetek_ibbk/tdb_histscen_asd/

There are several libraries to work with Hadoop in Python.有几个库可以在 Python 中使用 Hadoop。 For instance, if you use Pydoop , you can use pydoop.hdfs.path.isfile method.例如,如果您使用Pydoop ,则可以使用pydoop.hdfs.path.isfile方法。

You can check out their documentation你可以查看他们的文档

If you intention is to check if the URI is a directory or not using python, you can check alternatively like below:如果您打算检查 URI 是否是目录或不使用 python,您可以选择如下检查:

import subprocess
location='hdfs://nameservice1/client/tdb_histscen_2/part-00001'

filexistchk="hdfs dfs -test -e "+location+";echo $?"
#echo $? will print the exit code of previously execited command
filexistchk_output=subprocess.Popen(filexistchk,shell=True,stdout=subprocess.PIPE).communicate()
filechk="hdfs dfs -test -d "+location+";echo $?"
filechk_output=subprocess.Popen(filechk,shell=True,stdout=subprocess.PIPE).communicate()
#Check if location exists
if '1' not in str(filexistchk_output[0]):
    #check if its a directory
    if '1' not in str(filechk_output[0]):
        print('The given URI is a directory: '+location)
    else:
        print('The given URI is a file: '+location)
else:
    print(location+ " does not exist. Please check the URI")

About the command: hdfs dfs -test -[ezd] URI关于命令: hdfs dfs -test -[ezd] URI

Options: The -e option will check to see if the file exists, returning 0 if true.选项: -e 选项将检查文件是否存在,如果为真则返回 0。 The -z option will check to see if the file is zero length, returning 0 if true. -z 选项将检查文件的长度是否为零,如果为真则返回 0。 The -d option will check to see if the path is directory, returning 0 if true. -d 选项将检查路径是否为目录,如果为真则返回 0。 Example: hdfs dfs -test -d $yourdir示例:hdfs dfs -test -d $yourdir

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM