简体   繁体   中英

HDFS: Read data from HDFS to parse XML files in HDFS using Python3

I have about 1500 XML files in HDFS, each of them is about 2-3Gb. I need to write a python script to parse the XML files to perform MapReduce. However, I am facing issue to access the files in HDFS using python.

I tried the following script, and receive an error.

from snakebite.client import Client
def connection():
hadoop_client = Client('HDFS_hostname', 'HDFS_port', use_trash=False)
for x in hadoop_client.ls(['/']):
    print(x)

Following is the error:

Traceback (most recent call last):
  File "/home/ubuntu/PycharmProjects/textmining/read_data_from_HDFS.py", line 5, in <module>
    from snakebite.client import Client
  File "/usr/local/lib/python3.6/dist-packages/snakebite/client.py", line 1473
    baseTime = min(time * (1L << retries), cap);
                            ^
SyntaxError: invalid syntax

What is the best recommended way to access files from HDFS using python?

pip install snakebite-py3 

这将帮助您解决该问题...

i came acroos the same issue. snakebite is not comaptible with python 3.xu can use it with python 2.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM