I have about 1500 XML files in HDFS, each of them is about 2-3Gb. I need to write a python script to parse the XML files to perform MapReduce. However, I am facing issue to access the files in HDFS using python.
I tried the following script, and receive an error.
from snakebite.client import Client
def connection():
hadoop_client = Client('HDFS_hostname', 'HDFS_port', use_trash=False)
for x in hadoop_client.ls(['/']):
print(x)
Following is the error:
Traceback (most recent call last):
File "/home/ubuntu/PycharmProjects/textmining/read_data_from_HDFS.py", line 5, in <module>
from snakebite.client import Client
File "/usr/local/lib/python3.6/dist-packages/snakebite/client.py", line 1473
baseTime = min(time * (1L << retries), cap);
^
SyntaxError: invalid syntax
What is the best recommended way to access files from HDFS using python?
pip install snakebite-py3
这将帮助您解决该问题...
i came acroos the same issue. snakebite is not comaptible with python 3.xu can use it with python 2.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.