Hive在Python脚本中调用时尝试读取当前工作目录

Question

I am attempting to execute a Hive script from a Python wrapper. 我正在尝试从Python包装器执行Hive脚本。 Part of code looks like 部分代码看起来像

print(HiveArgs)
Hive = subprocess.Popen(HiveArgs, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
HiveOutput = Hive.communicate()

print("Out:" + HiveOutput[0])
print("=================================")
print("Err:" + HiveOutput[1])

The output of this is: 输出为：

['hive', '-i ', '/edw/edwdev/tmp/spark.txn.init.tmp', '-f ', '/edw/edwdev/tmp/test.hql.tmp']
Out:
=================================
Err:
Logging initialized using configuration in file:/etc/hive/2.5.0.2-3/0/hive-log4j.properties
Exception in thread "main" java.io.FileNotFoundException: File file:/data/edw/edwdev/  does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
        at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
        at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:348)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:782)
        at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:427)
        at org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:439)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)

where /data/edw/edwdev/ (file that Hive thinks is missing) is my working directory on a Linux server. /data/edw/edwdev/ （Hive认为缺少的文件）是我在Linux服务器上的工作目录。

Changing working directory to script's location doesn't help. 将工作目录更改为脚本的位置无济于事。 Using relative and absolute paths also makes no difference. 使用相对路径和绝对路径也没有区别。 If i copy values from the printed HiveArgs and execute the command from terminal ( hive -i /edw/edwdev/tmp/spark.txn.init.tmp -f /edw/edwdev/tmp/test.hql.tmp ), it works correctly. 如果我从打印的HiveArgs复制值并从终端执行命令（ hive -i /edw/edwdev/tmp/spark.txn.init.tmp -f /edw/edwdev/tmp/test.hql.tmp ）正确地。

What am I missing here? 我在这里想念什么？

Answer 1

Turned out that the issue was with Hive arguments. 原来，问题在于Hive的争论。 print(HiveArgs) line gave output: print(HiveArgs)行给出了输出：

['hive', '-i ', '/edw/edwdev/tmp/spark.txn.init.tmp', '-f ', '/edw/edwdev/tmp/test.hql.tmp']

The arguments passed are '-f ' and '-i ' (with trailing spaces) instead of '-f' and '-i' . 传递的参数是'-f '和'-i ' （带有尾部空格），而不是'-f'和'-i' 。

I am not sure what caused the issue within Hive leading it to read current working directory as some input file. 我不确定是什么原因导致Hive内的问题导致它将当前工作目录读取为某些输入文件。 Most likely Hive does not trim the arguments leading to this behavior. Hive最有可能不会整理导致这种现象的参数。 Removing the spaces fixed the issue. 删除空格可解决此问题。

Hive在Python脚本中调用时尝试读取当前工作目录

问题描述

1 个解决方案

解决方案1
0 2017-11-28 07:31:02

Hive在Python脚本中调用时尝试读取当前工作目录

问题描述

1 个解决方案

解决方案1 0 2017-11-28 07:31:02

解决方案1
0 2017-11-28 07:31:02