使用Hive + Cassandra社區時遇到的問題

Question

我正在嘗試使用HIVE 0.13來訪問用CQL3創建的cassandra 2.0.8列系列。

這是我創建列族的方式：

CREATE KEYSPACE IF NOT EXISTS Identification
  WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy',
  'DC1' : 2 };

USE Identification;

CREATE TABLE IF NOT EXISTS entitylookup (
  name varchar,
  value varchar,
  entity_id uuid,
  PRIMARY KEY ((name, value), entity_id))
WITH
    caching=all
;

我遵循了該項目自述文件中的指示： https : //github.com/tuplejump/cash/tree/master/cassandra-handler

我生成了hive-cassandra-1.2.6.jar，將其復制並將cassandra-all-1.2.6.jar，cassandra-thrift-1.2.6.jar復制到hive lib文件夾。

然后，我開始配置單元並嘗試以下操作：

CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary)
STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "localhost", "cassandra.port "= "9160")
TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy");

這是輸出：

hive> mvalle@mvalle:~/hadoop$ hive
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/05/30 12:02:02 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/mvalle/hadoop/apache-hive-0.13.0-bin/lib/hive-common-0.13.0.jar!/hive-log4j.properties
OpenJDK 64-Bit Server VM warning: You have loaded library /home/mvalle/hadoop/hadoop-2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
hive> CREATE EXTERNAL TABLE identification.entitylookup(name string, value string, entity_id binary)
    > STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "ident.s1mbi0se.com", "cassandra.port "= "9160")
    > TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':2", "cassandra.ks.strategy"="NetworkTopologyStrategy");
FAILED: SemanticException [Error 10072]: Database does not exist: identification

問題：如何獲得更多有關發生問題的信息？ 我使用“標識”（大寫I）嘗試了相同的蜂巢突擊隊，但結果相同。 是否可以在cassandra社區中訪問CQL3色譜柱族？ 似乎鍵空間尚未映射，但是那時我不知道如何映射。 在DSE中，它們會自動映射...

編輯：

為了進一步說明，如果我創建一個空數據庫，然后嘗試創建外部表，這是我得到的：

hive> create database identification;
OK
Time taken: 0.154 seconds
hive> CREATE EXTERNAL TABLE identification.entity_lookup(name string, value string, entity_id binary)
    > STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' WITH SERDEPROPERTIES("cql.primarykey" = "name, value", "cassandra.host" = "localhost", "cassandra.port "= "9160")
    > TBLPROPERTIES ("cassandra.ks.name" = "identification", "cassandra.ks.stratOptions"="'DC1':3", "cassandra.ks.strategy"="NetworkTopologyStrategy");
OK
Time taken: 3.58 seconds
hive> select * from identification.entity_lookup limit 10;
OK
Exception in thread "main" java.lang.InstantiationError: org.apache.hadoop.mapreduce.JobContext
    at org.apache.hadoop.hive.cassandra.input.cql.HiveCqlInputFormat.getSplits(HiveCqlInputFormat.java:166)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:418)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:561)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:534)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:137)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1488)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:285)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Answer 1

錯誤的原因不是Cash不能映射鍵空間，而是因為蜂巢中沒有數據庫。

只需在蜂巢中使用創建數據庫，

CREATE DATABASE identification;

那應該使它工作。

使用Hive + Cassandra社區時遇到的問題

問題描述

1 個解決方案

解決方案1
0 2014-07-15 08:18:49

使用Hive + Cassandra社區時遇到的問題

問題描述

1 個解決方案

解決方案1 0 2014-07-15 08:18:49

解決方案1
0 2014-07-15 08:18:49