简体   繁体   中英

WSO2BAM REST stream input to BAM/Cassandra; can't get to the EVENT_KS data using hive query?

The background for this question is essentially an article written by Sachini Jayasekara @ WSO2 called Using Different Reporting Frameworks with WSO2 Business Activity Monitor . I am doing more or less exactly the same, but using rather the REST API to define a data stream and invoke the REST WS API to push data into BAM. Then use the HIVE queries to get to the data. However, it seems that I have missed something, as the attribute data is not shown. Hence the query.

Currently using the REST api which is invoked through a Perl based daemon. This invokes the REST API using the following streams definition and payload:

{
  "name":"currentcostRealtime2.stream",
  "version": "1.0.6",
  "nickName": "Currentcost Realtime",
  "description": "This is the Currentcost realtime stream",
  "payloadData":[
    {
      "name":"sensor",
      "type":"INT"
    },
    {
      "name":"temp",
      "type":"FLOAT"
    },
    {
      "name":"timestamp",
      "type":"STRING"
    },
    {
      "name":"watt",
      "type":"INT"
    }
  ]
}

.. and payload definition ..

[
 {
   "payloadData" : [SENSOR, TEMP, "TIMESTAMP", WATT] ,
 }
]

I should note that the payload is string replaced before its committed; eg the actual payload that is committed looks like:

[
 {
   "payloadData" : [1, 18.7, "2014-06-15 16:15:56", 1] ,
 }
]

The queries execute with no apparent problem, but I am having now an issue with the HIVE query in BAM, which gives me entries output, but not the values. Eg trying to now execute the following HIVE query:

CREATE TABLE IF NOT EXISTS CurrentCostDataTemp ( sensor INT, temp FLOAT, ts TIMESTAMP, watt INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcostRealtime2_stream",
    "cassandra.columns.mapping" = "payload_sensor, payload_temp, payload_timestamp, payload_watt" );

select * from CurrentCostDataTemp;                                  

.. but this gives only the following (see specific picture below) - eg that there is NO attribute level data that is shown. However, it is evident that there are EVENT_KS entries given it outputs 4 rows.. so question is how do I reference the data to extract the values, or is there something else going on here that I am not aware of?:

key sensor  temp    ts  watt
1402816273765::192.168.1.106::9443::52              
1402815283659::192.168.1.106::9443::51              
1402815238323::192.168.1.106::9443::49              
1402815280532::192.168.1.106::9443::50              

Have verified that the data is in Cassandra by checking with Cqlsh - see here:

cqlsh:EVENT_KS> select * from "currentcostRealtime_stream";

 key                                    | Description                             | Name                       | Nick_Name            | StreamId                         | Timestamp     | Version | meta_ipAdd | payload_sensor | payload_temp | payload_timestamp   | payload_watt
----------------------------------------+-----------------------------------------+----------------------------+----------------------+----------------------------------+---------------+---------+------------+----------------+--------------+---------------------+--------------
 1402815283659::192.168.1.106::9443::51 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815283659 |   1.0.5 |       null |              1 |         18.7 | 2014-06-15 14:54:43 |            1
 1402815238323::192.168.1.106::9443::49 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815238323 |   1.0.5 |       null |              1 |         18.7 | 2014-06-15 14:53:58 |            1
 1402815280532::192.168.1.106::9443::50 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402815280532 |   1.0.5 |       null |              1 |         18.7 | 2014-06-15 14:54:40 |            1
 1402816273765::192.168.1.106::9443::52 | This is the Currentcost realtime stream | currentcostRealtime.stream | Currentcost Realtime | currentcostRealtime.stream:1.0.5 | 1402816273765 |   1.0.5 |       null |              1 |         18.7 | 2014-06-15 15:11:13 |            1

(4 rows)

cqlsh:EVENT_KS>

Most likely a minor issue only that I have overseen, but would be great if someone else have seen this and could respond as well..

When adding in a remote table definition to MySQL DB externally, the tables and all are created, but it seems like the problem is getting to the attribute data in the EVENT_KS table itself, and having that created and accessed through the HIVE script.

Thanks in advance!

/Jorgen

[UPDATE - Thursday 19th - SOLVED] Got it working with a few hints to this question. The following code works fine now, which is great.. greatly appreciated for the time to respond from you guys..

drop table CurrentCostDataTemp10;
drop table CurrentCostDataTemp_Summary10;

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp10 ( messageRowID STRING, payload_sensor INT, messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
  "cassandra.port" = "9160",
  "cassandra.ks.name" = "EVENT_KS",
  "cassandra.ks.username" = "<USER>",
  "cassandra.ks.password" = "<PASSWORD>",
  "cassandra.cf.name" = "currentcostsimple5_stream",
  "cassandra.columns.mapping" = ":key, payload_sensor, Timestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt" );

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp_Summary10 ( messageRowID STRING, payload_sensor INT, messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql STRING, payload_watt INT ) 
STORED BY 'org.wso2.carbon.hadoop.hive.jdbc.storage.JDBCStorageHandler'
TBLPROPERTIES (
  'mapred.jdbc.driver.class' = 'com.mysql.jdbc.Driver',
  'mapred.jdbc.url' = 'jdbc:mysql://127.0.0.1:8889/currentcost' ,
  'mapred.jdbc.username' = '<USER>',
  'mapred.jdbc.password' = '<PASSWORD>',
  'hive.jdbc.update.on.duplicate'= 'true',
  'hive.jdbc.primary.key.fields' = 'messageRowID',
  'hive.jdbc.table.create.query' = 'CREATE TABLE CurrentCostDataTemp1 ( messageRowID VARCHAR(100) NOT NULL PRIMARY KEY, payload_sensor TINYINT(4), messageTimestamp BIGINT, payload_temp FLOAT, payload_timestamp BIGINT, payload_timestampmysql DATETIME, payload_watt INT ) ');

insert overwrite table CurrentCostDataTemp_Summary10 select messageRowID, payload_sensor, messageTimestamp, payload_temp, payload_timestamp, payload_timestampmysql, payload_watt FROM CurrentCostDataTemp10;

Using Different Reporting Frameworks with WSO2 Business Activity Monitor. By Sachini Jayasekara

Try changing the 1st line of the script as follows.

CREATE EXTERNAL TABLE IF NOT EXISTS CurrentCostDataTemp ( key STRING , sensor INT, temp FLOAT, ts TIMESTAMP, watt INT)

(Remove key STRING part if it gives errors.)

Note: May be you will have to run DROP TABLE CurrentCostDataTemp before running above, in case it is already created, when you run it before.

I have amended your query as follows. Please try with that.

CREATE external TABLE IF NOT EXISTS CurrentCostDataTemp ( key string, sensor INT, temp FLOAT, ts TIMESTAMP, watt INT ) 
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.host" = "127.0.0.1",
    "cassandra.port" = "9160",
    "cassandra.ks.name" = "EVENT_KS",
    "cassandra.ks.username" = "admin",
    "cassandra.ks.password" = "admin",
    "cassandra.cf.name" = "currentcostRealtime2_stream",
    "cassandra.columns.mapping" = ":key,payload_sensor, payload_temp, payload_timestamp, payload_watt" );

select * from CurrentCostDataTemp;  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM