[英]Hive trying to read current working directory when called in Python script
I am attempting to execute a Hive script from a Python wrapper. 我正在尝试从Python包装器执行Hive脚本。 Part of code looks like
部分代码看起来像
print(HiveArgs)
Hive = subprocess.Popen(HiveArgs, stderr=subprocess.PIPE, stdout=subprocess.PIPE)
HiveOutput = Hive.communicate()
print("Out:" + HiveOutput[0])
print("=================================")
print("Err:" + HiveOutput[1])
The output of this is: 输出为:
['hive', '-i ', '/edw/edwdev/tmp/spark.txn.init.tmp', '-f ', '/edw/edwdev/tmp/test.hql.tmp']
Out:
=================================
Err:
Logging initialized using configuration in file:/etc/hive/2.5.0.2-3/0/hive-log4j.properties
Exception in thread "main" java.io.FileNotFoundException: File file:/data/edw/edwdev/ does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:348)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:782)
at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:427)
at org.apache.hadoop.hive.cli.CliDriver.processInitFiles(CliDriver.java:439)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:708)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
where /data/edw/edwdev/
(file that Hive thinks is missing) is my working directory on a Linux server. /data/edw/edwdev/
(Hive认为缺少的文件)是我在Linux服务器上的工作目录。
Changing working directory to script's location doesn't help. 将工作目录更改为脚本的位置无济于事。 Using relative and absolute paths also makes no difference.
使用相对路径和绝对路径也没有区别。 If i copy values from the printed
HiveArgs
and execute the command from terminal ( hive -i /edw/edwdev/tmp/spark.txn.init.tmp -f /edw/edwdev/tmp/test.hql.tmp
), it works correctly. 如果我从打印的
HiveArgs
复制值并从终端执行命令( hive -i /edw/edwdev/tmp/spark.txn.init.tmp -f /edw/edwdev/tmp/test.hql.tmp
)正确地。
What am I missing here? 我在这里想念什么?
Turned out that the issue was with Hive arguments. 原来,问题在于Hive的争论。
print(HiveArgs)
line gave output: print(HiveArgs)
行给出了输出:
['hive', '-i ', '/edw/edwdev/tmp/spark.txn.init.tmp', '-f ', '/edw/edwdev/tmp/test.hql.tmp']
The arguments passed are '-f '
and '-i '
(with trailing spaces) instead of '-f'
and '-i'
. 传递的参数是
'-f '
和'-i '
(带有尾部空格),而不是'-f'
和'-i'
。
I am not sure what caused the issue within Hive leading it to read current working directory as some input file. 我不确定是什么原因导致Hive内的问题导致它将当前工作目录读取为某些输入文件。 Most likely Hive does not trim the arguments leading to this behavior.
Hive最有可能不会整理导致这种现象的参数。 Removing the spaces fixed the issue.
删除空格可解决此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.