[英]How to use Hive without hadoop
I am a new to NoSQL solutions and want to play with Hive. But installing HDFS/Hadoop takes a lot of resources and time (maybe without experience but I got no time to do this).我是 NoSQL 解决方案的新手,想玩 Hive。但是安装 HDFS/Hadoop 需要大量资源和时间(可能没有经验,但我没有时间这样做)。
Are there ways to install and use Hive on a local machine without HDFS/Hadoop?有没有办法在没有 HDFS/Hadoop 的本地机器上安装和使用 Hive?
yes you can run hive without hadoop 1.create your warehouse on your local system 2. give default fs as file:/// than you can run hive in local mode with out hadoop installation是的,您可以在没有 hadoop 的情况下运行 hive 1.在本地系统上创建您的仓库 2. 将默认 fs 指定为 file:/// 比您可以在没有 hadoop 安装的情况下在本地模式下运行 hive
In Hive-site.xml在 Hive-site.xml 中
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<!-- this should eventually be deprecated since the metastore should supply this -->
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
I would recomend you to use something like this. 我建议您使用这样的东西。
http://hortonworks.com/products/hortonworks-sandbox/ http://hortonworks.com/products/hortonworks-sandbox/
Its a fully functional VM with everything you need to start right away. 它是功能完备的VM,可立即启动,并提供您所需的一切。
If you are just talking about experiencing Hive before making a decision you can just use a preconfigured VM as @Maltram suggested (Hortonworks, Cloudera, IBM and others all offer such VMs)如果您只是在做决定之前谈论体验 Hive,您可以使用@Maltram 建议的预配置 VM(Hortonworks、Cloudera、IBM 和其他公司都提供此类 VM)
What you should keep in mind that you will not be able to use Hive in production without Hadoop and HDFS so if it is a problem for you, you should consider alternatives to Hive您应该记住的是,如果没有 Hadoop 和 HDFS,您将无法在生产中使用 Hive,因此如果这对您来说是个问题,您应该考虑 Hive 的替代品
You cant, just download Hive, and run:你不能,只需下载 Hive,然后运行:
./bin/hiveserver2
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
Hadoop is like a core, and Hive need some library from it. Hadoop 就像一个核心,Hive 需要从中获取一些库。
Update This answer is out-of-date : with Hive on Spark
it is no longer necessary to have hdfs
support.更新此答案已过时:使用Hive on Spark
上的Hive on Spark
不再需要hdfs
支持。
Hive requires hdfs and map/reduce so you will need them. Hive 需要 hdfs 和 map/reduce,因此您将需要它们。 The other answer has some merit in the sense of recommending a simple / pre-configured means of getting all of the components there for you.另一个答案在推荐一种简单/预先配置的方法来为您获取所有组件的意义上有一些优点。
But the gist of it is: hive needs hadoop and m/r so in some degree you will need to deal with it.但它的要点是:hive 需要 hadoop 和 m/r,所以在某种程度上你需要处理它。
Although, there are some details that you have to keep in mind it's completely normal to use Hive without HDFS.尽管如此,您必须记住一些细节,在没有 HDFS 的情况下使用 Hive 是完全正常的。 There are a few details one should keep in mind.有一些细节应该记住。
schematool
通过手动调用schematool
初始化您的数据库You can use site.xml
file pointing to local POSIX filesystem, but you can also set those options in HIVE_OPTS
environmen variable.您可以使用指向本地 POSIX 文件系统的site.xml
文件,但您也可以在HIVE_OPTS
环境变量中设置这些选项。 I covered that with examples of errors I've seen on my blog post我用我在博客文章中看到的错误示例进行了介绍
Top answer works for me.最佳答案对我有用。 But need few more setups.但需要更多设置。 I spend a quite some time search around to fix multiple problems until I finally set it up.我花了很多时间四处寻找解决多个问题,直到我最终设置它。 Here I summarize the steps from scratch:在这里我总结了从头开始的步骤:
hive-env.sh
设置hive-env.sh
$ cd hive/conf $ cp hive-env.sh.template hive-env.sh
Add following environment in hive-env.sh
(change path accordingly based on actual java/hadoop version)在hive-env.sh
中添加以下环境(根据实际 java/hadoop 版本相应更改路径) JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home export path=$JAVA_HOME/bin:$path export HADOOP_HOME=${bin}/../../hadoop-3.3.1
hive-site.xml
设置hive-site.xml
$ cd hive/conf $ cp hive-default.xml.template hive-site.xml
Replace all the variable ${system:***}
with constant paths (Not sure why this is not recognized in my system).将所有变量${system:***}
替换为常量路径(不确定为什么在我的系统中无法识别)。 Set database path to local with following attributes (copied from top answer)使用以下属性将数据库路径设置为本地(从最佳答案复制) <configuration> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <.-- this should eventually be deprecated since the metastore should supply this --> <name>hive.metastore.warehouse:dir</name> <value>file.///tmp</value> <description></description> </property> <property> <name>fs.default:name</name> <value>file:///tmp</value> </property> </configuration>
hive-log4j2.properties
(optional, good for troubleshooting)设置hive-log4j2.properties
(可选,有助于故障排除) cp hive-log4j2.properties.template hive-log4j2.properties
Replace all the variable ${sys:***}
to constant path将所有变量${sys:***}
替换为常量路径metastore_db
If you directly run hive
, when do any DDL, you will got error of:设置metastore_db
如果你直接运行hive
,当做任何 DDL 时,你会得到错误: FAILED: HiveException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Hive metastore database is not initialized. Please use schematool (eg ./schematool -initSchema -dbType...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (eg ? createDatabaseIfNotExist=true for mysql))
In that case we need to recreate metastore_db with following command在这种情况下,我们需要使用以下命令重新创建 metastore_db $ cd hive/bin $ rm -rf metastore_db $./schematool -initSchema -dbType derby
$ cd hive/bin $./hive
Now you should be able run hive on you local file system.现在您应该可以在本地文件系统上运行 hive 了。 One thing to note, the metastore_db
will always be created on you current directory.需要注意的一件事是, metastore_db
将始终在您的当前目录中创建。 If you start hive in a different directory, you need to recreate it again.如果在不同目录下启动hive,需要重新创建。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.