使用帶有子進程，管道，Popen的python從hdfs讀取/寫入文件產生錯誤

Question

我正在嘗試在Python腳本中的hdfs中讀取（打開）和寫入文件。 但是有錯誤。 有人可以告訴我這是怎么回事。

代碼（完整）：sample.py

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

print "After Loop 2"
for line in cat.stdout:
    line += "Blah"
    print line
    print "Inside Loop"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

當我執行時：

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar -file ./sample.py -mapper './sample.py' -input sample.txt -output fileRead

它執行正常，我找不到應該在hdfsmodifiedfile中創建的文件

當我執行時：

 hadoop fs -getmerge ./fileRead/ file.txt

在file.txt中，我得到了：

Before Loop 
Before Loop 
After Loop 1    
After Loop 1    
After Loop 2    
After Loop 2

有人可以告訴我我在做什么錯嗎？ 我不認為它從sample.txt中讀取

Answer 1

嘗試更改您的put子流程，以通過更改此方式自行獲取cat stdout

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=PIPE)

進入這個

put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)

完整腳本：

#!/usr/bin/python

from subprocess import Popen, PIPE

print "Before Loop"

cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"],
            stdout=PIPE)

print "After Loop 1"
put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"],
            stdin=cat.stdout)
put.communicate()

Answer 2

有人可以告訴我我在做什么錯嗎？

您的sample.py可能不是正確的映射器。 映射器可能會在stdin上接受其輸入，並將結果寫入其stdout中，例如blah.py ：

#!/usr/bin/env python
import sys

for line in sys.stdin: # print("Blah\n".join(sys.stdin) + "Blah\n")
    line += "Blah"
    print(line)

用法：

$ hadoop ... -file ./blah.py -mapper './blah.py' -input sample.txt -output fileRead

使用帶有子進程，管道，Popen的python從hdfs讀取/寫入文件產生錯誤

問題描述

2 個解決方案

解決方案1
1 2015-01-25 23:18:41

解決方案2
0 2015-01-27 07:35:49

使用帶有子進程，管道，Popen的python從hdfs讀取/寫入文件產生錯誤

問題描述

2 個解決方案

解決方案1 1 2015-01-25 23:18:41

解決方案2 0 2015-01-27 07:35:49

解決方案1
1 2015-01-25 23:18:41

解決方案2
0 2015-01-27 07:35:49