简体   繁体   English

如何在没有hadoop的情况下使用Hive

[英]How to use Hive without hadoop

I am a new to NoSQL solutions and want to play with Hive. But installing HDFS/Hadoop takes a lot of resources and time (maybe without experience but I got no time to do this).我是 NoSQL 解决方案的新手,想玩 Hive。但是安装 HDFS/Hadoop 需要大量资源和时间(可能没有经验,但我没有时间这样做)。

Are there ways to install and use Hive on a local machine without HDFS/Hadoop?有没有办法在没有 HDFS/Hadoop 的本地机器上安装和使用 Hive?

yes you can run hive without hadoop 1.create your warehouse on your local system 2. give default fs as file:/// than you can run hive in local mode with out hadoop installation是的,您可以在没有 hadoop 的情况下运行 hive 1.在本地系统上创建您的仓库 2. 将默认 fs 指定为 file:/// 比您可以在没有 hadoop 安装的情况下在本地模式下运行 hive

In Hive-site.xml在 Hive-site.xml 中

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
<configuration>
      <property>
         <name>hive.metastore.schema.verification</name> 
         <value>false</value> 
      </property> 
     <property> 
      <!-- this should eventually be deprecated since the metastore should supply this --> 
        <name>hive.metastore.warehouse.dir</name> 
        <value>file:///tmp</value>
        <description></description> 
     </property>
     <property> 
        <name>fs.default.name</name> 
        <value>file:///tmp</value> 
     </property> 
</configuration>

I would recomend you to use something like this. 我建议您使用这样的东西。

http://hortonworks.com/products/hortonworks-sandbox/ http://hortonworks.com/products/hortonworks-sandbox/

Its a fully functional VM with everything you need to start right away. 它是功能完备的VM,可立即启动,并提供您所需的一切。

If you are just talking about experiencing Hive before making a decision you can just use a preconfigured VM as @Maltram suggested (Hortonworks, Cloudera, IBM and others all offer such VMs)如果您只是在做决定之前谈论体验 Hive,您可以使用@Maltram 建议的预配置 VM(Hortonworks、Cloudera、IBM 和其他公司都提供此类 VM)

What you should keep in mind that you will not be able to use Hive in production without Hadoop and HDFS so if it is a problem for you, you should consider alternatives to Hive您应该记住的是,如果没有 Hadoop 和 HDFS,您将无法在生产中使用 Hive,因此如果这对您来说是个问题,您应该考虑 Hive 的替代品

You cant, just download Hive, and run:你不能,只需下载 Hive,然后运行:

./bin/hiveserver2                                                                                                                                        
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path

Hadoop is like a core, and Hive need some library from it. Hadoop 就像一个核心,Hive 需要从中获取一些库。

Update This answer is out-of-date : with Hive on Spark it is no longer necessary to have hdfs support.更新此答案已过时:使用Hive on Spark上的Hive on Spark不再需要hdfs支持。


Hive requires hdfs and map/reduce so you will need them. Hive 需要 hdfs 和 map/reduce,因此您将需要它们。 The other answer has some merit in the sense of recommending a simple / pre-configured means of getting all of the components there for you.另一个答案在推荐一种简单/预先配置的方法来为您获取所有组件的意义上有一些优点。

But the gist of it is: hive needs hadoop and m/r so in some degree you will need to deal with it.但它的要点是:hive 需要 hadoop 和 m/r,所以在某种程度上你需要处理它。

Although, there are some details that you have to keep in mind it's completely normal to use Hive without HDFS.尽管如此,您必须记住一些细节,在没有 HDFS 的情况下使用 Hive 是完全正常的。 There are a few details one should keep in mind.有一些细节应该记住。

  1. As a few commenters mentioned above you'll still need some .jar files from hadoop common .正如上面提到的一些评论者,您仍然需要一些来自hadoop common 的.jar 文件。
  2. As of today(XII 2020) it's difficult to run Hive/hadoop3 pair.截至今天(XII 2020),运行 Hive/hadoop3 对很困难。 Use stable hadoop2 with Hive2.在 Hive2 中使用稳定的 hadoop2。
  3. Make sure POSIX permissions are set correctly, so your local hive can access warehouse and eventually derby database location.确保正确设置 POSIX 权限,以便您的本地配置单元可以访问仓库并最终访问 derby 数据库位置。
  4. Initialize your database by manual call to schematool通过手动调用schematool初始化您的数据库

You can use site.xml file pointing to local POSIX filesystem, but you can also set those options in HIVE_OPTS environmen variable.您可以使用指向本地 POSIX 文件系统的site.xml文件,但您也可以在HIVE_OPTS环境变量中设置这些选项。 I covered that with examples of errors I've seen on my blog post我用我在博客文章中看到的错误示例进行了介绍

Top answer works for me.最佳答案对我有用。 But need few more setups.但需要更多设置。 I spend a quite some time search around to fix multiple problems until I finally set it up.我花了很多时间四处寻找解决多个问题,直到我最终设置它。 Here I summarize the steps from scratch:在这里我总结了从头开始的步骤:

  • Download hive, decompress it下载hive,解压
  • Download hadoop, decompress it, put it in the same parent folder as hive下载hadoop,解压,和hive放在同一父文件夹下
  • Setup hive-env.sh设置hive-env.sh
     $ cd hive/conf $ cp hive-env.sh.template hive-env.sh
    Add following environment in hive-env.sh (change path accordingly based on actual java/hadoop version)hive-env.sh中添加以下环境(根据实际 java/hadoop 版本相应更改路径)
     JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home export path=$JAVA_HOME/bin:$path export HADOOP_HOME=${bin}/../../hadoop-3.3.1
  • setup hive-site.xml设置hive-site.xml
     $ cd hive/conf $ cp hive-default.xml.template hive-site.xml
    Replace all the variable ${system:***} with constant paths (Not sure why this is not recognized in my system).将所有变量${system:***}替换为常量路径(不确定为什么在我的系统中无法识别)。 Set database path to local with following attributes (copied from top answer)使用以下属性将数据库路径设置为本地(从最佳答案复制)
     <configuration> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <.-- this should eventually be deprecated since the metastore should supply this --> <name>hive.metastore.warehouse:dir</name> <value>file.///tmp</value> <description></description> </property> <property> <name>fs.default:name</name> <value>file:///tmp</value> </property> </configuration>
  • setup hive-log4j2.properties (optional, good for troubleshooting)设置hive-log4j2.properties (可选,有助于故障排除)
     cp hive-log4j2.properties.template hive-log4j2.properties
    Replace all the variable ${sys:***} to constant path将所有变量${sys:***}替换为常量路径
  • Setup metastore_db If you directly run hive , when do any DDL, you will got error of:设置metastore_db如果你直接运行hive ,当做任何 DDL 时,你会得到错误:
     FAILED: HiveException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Hive metastore database is not initialized. Please use schematool (eg ./schematool -initSchema -dbType...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (eg ? createDatabaseIfNotExist=true for mysql))
    In that case we need to recreate metastore_db with following command在这种情况下,我们需要使用以下命令重新创建 metastore_db
     $ cd hive/bin $ rm -rf metastore_db $./schematool -initSchema -dbType derby
  • Start hive开始 hive
     $ cd hive/bin $./hive

Now you should be able run hive on you local file system.现在您应该可以在本地文件系统上运行 hive 了。 One thing to note, the metastore_db will always be created on you current directory.需要注意的一件事是, metastore_db将始终在您的当前目录中创建。 If you start hive in a different directory, you need to recreate it again.如果在不同目录下启动hive,需要重新创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM