简体   繁体   English

猪-过滤器或如何进入袋子或元组的侧面

[英]PIG - Filter or how to get in side of a bag or tuple

AS you can see we can apply filter to the first one because, we can used aggregate on the temperature. 如您所见,我们可以将过滤器应用于第一个过滤器,因为我们可以在温度上使用聚合。 Now how do we apply the second filter on STRINGS? 现在,我们如何在STRINGS上应用第二个过滤器?

We are only trying to filter e with conditions clear and partly cloudy. 我们仅在条件清晰且部分多云的情况下过滤e。

Weather = LOAD 'hdfs:/home/hduser/final/Weather.csv' USING PigStorage(',');
A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, $14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;


group_by_day = GROUP A BY (year,month,day);

Schema: 架构:

   {day: (year: int,month: int, day: int), temperature {(temp: int)},                   

   condition: {cond: bytearray)}, dewPoint: {(dewpoint: double)} windSpeed:

   {(wind: double)}}

You have to cast cond as chararray in the below statement.Since you have not specified the datatype in your load statement,all fields will be loaded as bytearray.That is the default datatype chosen by PigStorage. 您必须在下面的语句中将cond转换为chararray。由于未在load语句中指定数据类型,因此所有字段都将作为bytearray加载。这是PigStorage选择的默认数据类型。

A = FOREACH Weather GENERATE (int)$0 AS year, (int)$1 AS month, (int)$2 AS day, (int)$4 AS temp, (chararray)$14 AS cond, (double)$5 as dewpoint , (double)$10 as wind;

EDIT 编辑

I was able to get the results by use BagToString function.You can do the filtering in 1 step iteslf. 我可以通过使用BagToString函数来获得结果。 可以一步一步进行过滤。

D = FILTER C BY (MIN(temperature) >= 60 AND MAX(temperature) <= 79) AND (BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy');

Or in your case 还是你的情况

f = FILTER e BY BagToString(condition) == 'clear' OR BagToString(condition) == 'partly cloudy';

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM