来自 python 的 Hadoop 命令

Question

I am trying to get some stats for a directory in hdfs.我正在尝试获取 hdfs 中某个目录的一些统计信息。 I am trying to get the no of files/subdirs and the size for each.我正在尝试获取文件/子目录的数量和每个文件的大小。 I started out thinking that I can do this in bash.我开始认为我可以在 bash 中做到这一点。

#!/bin/bash
OP=$(hadoop fs -ls hdfs://mydirectory)
echo $(wc -l < "$OP")

I only have this much so far and I quickly realised that python might be a better option for this.到目前为止我只有这么多，我很快意识到python可能是一个更好的选择。 However I am not able to figure out how to execute hadoop commands like hadoop fs -ls from python但是我无法弄清楚如何hadoop fs -ls from python执行 hadoop 命令，比如hadoop fs -ls from python

Answer 1

Try the following snippet:尝试以下代码段：

output = subprocess.Popen(["hadoop", "fs", "-ls", "/user"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in output.stdout:
    print(line)

Additionally, you can refer to this sub-process example , where you can get return status, output and error message separately.此外，您可以参考这个子流程示例，您可以在其中分别获取返回状态、输出和错误消息。

Answer 2

See https://docs.python.org/2/library/commands.html for your options, including how to get the return status (in case of an error).请参阅https://docs.python.org/2/library/commands.html了解您的选项，包括如何获取返回状态（发生错误时）。 The basic code you're missing is您缺少的基本代码是

import commands

hdir_list = commands.getoutput('hadoop fs -ls hdfs://mydirectory')

Yes: deprecated in 2.6, still useful in 2.7, but removed from Python 3. If that bothers you, switch to是：在 2.6 中已弃用，在 2.7 中仍然有用，但已从 Python 3 中删除。如果这让您感到困扰，请切换到

os.command (<code string>)

... or better yet use subprocess.call (introduced in 2.4). ...或者更好地使用subprocess.call （在 2.4 中引入）。

来自 python 的 Hadoop 命令

问题描述

2 个解决方案

解决方案1
1 2017-05-11 23:20:49

解决方案2
-1 已采纳 2015-10-07 20:09:36

来自 python 的 Hadoop 命令

问题描述

2 个解决方案

解决方案1 1 2017-05-11 23:20:49

解决方案2 -1 已采纳 2015-10-07 20:09:36

解决方案1
1 2017-05-11 23:20:49

解决方案2
-1 已采纳 2015-10-07 20:09:36