简体   繁体   English

使用Python在hdfs上读/写文件

[英]Read/Write files on hdfs using Python

I am a newbie to Python, I want to read a file from hdfs (which I have achieved). 我是Python的新手,我想从hdfs读取一个文件(我已经实现了)。

after reading the file I am doing some string operations and I want to write these modified contents into the output file. 在读取文件后,我正在进行一些字符串操作,我想将这些修改后的内容写入输出文件。

Reading the file I achieved using subprocess (which took a lot of time) since open didn't work for me. 读取我使用子进程(花了很多时间)实现的文件,因为打开对我不起作用。

cat = Popen(["hadoop", "fs", "-cat", "/user/hdfs/test-python/input/test_replace"],stdout=PIPE)

Now, how to write to the output file with the modified contents is the question. 现在,如何使用修改后的内容写入输出文件是个问题。

Your help is highly appreciated 非常感谢您的帮助

您可以使用库来读取和写入HDFS,例如https://github.com/mtth/hdfs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM