简体   繁体   English

使用CSV格式的非结构化GPS数据包创建结构化配置单元表

[英]Creating structured hive table with unstructured GPS packets in csv format

I have a csv file like mentioned below. 我有一个如下所述的csv文件。

VTS,51,0071,9739965515,NM,GP,INF01,V,19,072219,291014,0000.0000,N,00000.0000,E,07AE VTS,01,0097,9739965515,SP,GP,18,072253,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,169,B205 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072311,291014,0000.0000,N,00000.0000,E,C24E VTS,01,0097,9739965515,NM,GP,19,072311,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,171,B358 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072319,291014,0000.0000,N,00000.0000,E,012F VTS,51,0071,9739965515,NM,GP,INF01,V,19,072326,291014,0000.0000,N,00000.0000,E,B2E6 VTS,01,0097,9739965515,NM,GP,18,072326,V,0000.0000,N,00000.0000,E,0.0,0.0,291014,0000,00,4000,11,999,173,EAA0 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072333,291014,0000.0000,N,00000.0000,E,9896 VTS,51,0071,9739965515,NM,GP,INF01,V,18,072340,291014,0000.0000,N,00000.0000,E,9B23

This has to be mapped with fields: 这必须与字段映射:

pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum

The second field ie gprs_pkt_id with value 01 depicts a valid packet. 第二个字段,即值为01的gprs_pkt_id描述了一个有效的数据包。 My used case is to filter the csv data only for valid packets, for which I am using regex, but I am not able to get the entire data. 我的用例是仅针对有效数据包过滤CSV数据,而我正在使用正则表达式,但是我无法获取全部数据。 Any help will be deeply appreciated. 任何帮助将不胜感激。

The used Hive query is shown below. 使用的Hive查询如下所示。

CREATE EXTERNAL TABlE sky_track_testing1( pkt_header STRING, gprs_pkt_id STRING, pkt_length STRING, sim_no STRING, msg_id STRING, gprs_pkt STRING, gsm_sig_strength STRING, utc_time STRING, pkt_validation STRING, latitude STRING, direction_n_s STRING, longitude STRING, direction_e_w STRING, speed STRING, track_angle STRING, utc_date STRING, fuel_adc_values STRING, ignition STRING, odometer_values STRING, supply_int STRING, battery_adc STRING, pkt_id STRING, check_sum STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "^(VTS,01).*$" ) STORED AS TEXTFILE LOCATION '/user/root/sky_track';

This is definitely a wrong query. 这绝对是错误的查询。 Please help me. 请帮我。

I recommend you use Pig for this: 我建议您为此使用Pig

a = load '/user/root/sky_track' as (pkt_header,gprs_pkt_id,pkt_length,sim_no,msg_id,gprs_pkt,gsm_sig_strength,utc_time,pkt_validation,latitude,direction_n_s,longitude,direction_e_w,speed,track_angle,utc_date,fuel_adc_values,ignition,odometer_values,supply_int,battery_adc,pkt_id,check_sum);
b = filter a by gprs_pkt_id == '01';
store b into '/user/root/sky_track_valid';

Yes, As per the above answer Pig will be very well suited for your data. 是的,根据以上答案,Pig非常适合您的数据。 you can give a try in pig. 你可以试试猪。 If you are interested in hive, please see the below example(regex is not needed for your dataset). 如果您对配置单元感兴趣,请参见以下示例(数据集不需要正则表达式)。

hive> CREATE  TABLE sky_track_testing1(
    > pkt_header STRING,
    > gprs_pkt_id STRING,
    > pkt_length STRING,
    > sim_no STRING,
    > msg_id STRING,
    > gprs_pkt STRING,
    > gsm_sig_strength STRING,
    > utc_time STRING,
    > pkt_validation STRING,
    > latitude STRING,
    > direction_n_s  STRING,
    > longitude  STRING,
    > direction_e_w STRING,
    > speed STRING,
    > track_angle  STRING,
    > utc_date STRING,
    > fuel_adc_values STRING,
    > ignition  STRING,
    > odometer_values STRING,
    > supply_int  STRING,
    > battery_adc  STRING,
    > pkt_id  STRING,
    > check_sum STRING
    > ) 
    > ROW FORMAT
    > DELIMITED FIELDS TERMINATED BY ','
    > LINES TERMINATED BY '\n'
    > STORED AS TEXTFILE;
OK
Time taken: 0.1 seconds

hive> select *from sky_track_testing1 where gprs_pkt_id='01';
OK
VTS 01  0097    9739965515  SP  GP  18  072253  V   0000.0000   N   00000.0000  E   0.0 0.0 291014  0000    00  4000    1999    169 B205
VTS 01  0097    9739965515  NM  GP  19  072311  V   0000.0000   N   00000.0000  E   0.0 0.0 291014  0000    00  4000    1999    171 B358
VTS 01  0097    9739965515  NM  GP  18  072326  V   0000.0000   N   00000.0000  E   0.0 0.0 291014  0000    00  4000    1999    173 EAA0
Time taken: 14.328 seconds

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM