简体   繁体   English

从 pydoop 访问 hdfs 集群

[英]Access hdfs cluster from pydoop

I have hdfs cluster and python on the same google cloud platform.我在同一个谷歌云平台上有 hdfs 集群和 python。 I want to access the files present in the hdfs cluster from python.我想从 python 访问 hdfs 集群中存在的文件。 I found that using pydoop one can do that but I am struggling with giving it right parameters maybe.我发现使用 pydoop 可以做到这一点,但我可能正在努力为其提供正确的参数。 Below is the code that I have tried so far:-以下是我到目前为止尝试过的代码:-

import pydoop.hdfs as hdfs
import pydoop

pydoop.hdfs.hdfs(host='url of the file system goes here',
                 port=9864, user=None, groups=None)

"""
 class pydoop.hdfs.hdfs(host='default', port=0, user=None, groups=None)

    A handle to an HDFS instance.

    Parameters

            host (str) – hostname or IP address of the HDFS NameNode. Set to an empty string (and port to 0) to connect to the local file system; set to 'default' (and port to 0) to connect to the default (i.e., the one defined in the Hadoop configuration files) file system.

            port (int) – the port on which the NameNode is listening

            user (str) – the Hadoop domain user name. Defaults to the current UNIX user. Note that, in MapReduce applications, since tasks are spawned by the JobTracker, the default user will be the one that started the JobTracker itself.

            groups (list) – ignored. Included for backwards compatibility.


"""

#print (hdfs.ls("/vs_co2_all_2019_v1.csv"))

It gives this error:-它给出了这个错误: -

RuntimeError: Hadoop config not found, try setting HADOOP_CONF_DIR

And if I execute this line of code:-如果我执行这行代码:-

print (hdfs.ls("/vs_co2_all_2019_v1.csv"))

nothing happens.什么都没发生。 But this "vs_co2_all_2019_v1.csv" file does exist in the cluster but is not available at the moment, when I took screenshot.但是这个“vs_co2_all_2019_v1.csv”文件确实存在于集群中,但在我截屏时目前不可用。

My hdfs screenshot is shown below:我的 hdfs 截图如下所示:

HDFS结构

and the credentials that I have are shown below:我拥有的凭据如下所示:

在此处输入图像描述

Can anybody tell me that what am I doing wrong?谁能告诉我我做错了什么? Which credentials do I need to put where in the pydoop api?我需要将哪些凭据放在 pydoop api 中的哪个位置? Or maybe there is another simpler way around this problem, any help will be much appreciated!!或者也许有另一种更简单的方法来解决这个问题,任何帮助将不胜感激!

Have you tried the following?您是否尝试过以下操作?

import pydoop.hdfs as hdfs
import pydoop

hdfs_object = pydoop.hdfs.hdfs(host='url of the file system goes here',
                               port=9864, user=None, groups=None)
hdfs_object.list_directory("/vs_co2_all_2019_v1.csv")

or simply:或者简单地说:

hdfs_object.list_directory("/")

Keep in mind that pydoop.hdfs module is not directly related with the hdfs class ( hdfs_object ).请记住, pydoop.hdfs模块与hdfs class ( hdfs_object ) 没有直接关系。 Thus, the connection that you established in the first command is not used in hdfs.ls("/vs_co2_all_2019_v1.csv")因此,您在第一个命令中建立的连接不会在hdfs.ls("/vs_co2_all_2019_v1.csv")中使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从在Kubernetes上运行的Spark集群(2.1)查询hdfs? - How to query hdfs from a spark cluster (2.1) which is running on kubernetes? 我可以从自定义 Web 应用程序访问 HDFS 文件吗 - Can I access HDFS files from Custom web application 无法从 fastapi api 访问 redis 当它们在同一个集群、同一个 pod 中时,通过 docker 桌面使用单节点集群 - Cant access redis from fastapi api when they are in a same cluster, same pod, using a single node cluster via docker desktop 集群,来自Python的节点 - Cluster, Node from Python 从Spark DataFrame创建非HDFS CSV - Create a non-hdfs csv from spark dataframe 如何使用Python远程读取和写入HDFS? - How can I read from and write to HDFS remotely using Python? 创建Spark DataFrame时从hdfs文件传递架构 - Pass schema from hdfs file while creating Spark DataFrame 对于给定的单词,预测集群并从集群中获取最近的单词 - For a given word, Predict the cluster and get the nearest words from the cluster 如何将文件从python上传到HPC集群? - How to upload a file from python to HPC cluster? 使用 python api 从 GCP 管理 Kubernetes 集群 - Managing Kubernetes cluster from GCP with python api
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM