Combination of MapReduce and Hive

Question

Can we use a combination of Hive and MapReduce

Say: I am having a csv file. I need to find the mean of a column and replace the null data with the mean( replace null with mean ).

so whether we can write a hive query in driver (to find the mean) then write a mapreduce block to replace the null with mean.

Which is better way

writing only mapreduce code or
Use a combination of hive and mapreduce.

Answer 1

here is the other answer that can be solved using ( only hive )

say your csv input is like this:

firstname,secondname,score,group

vijay,kumar,123,cse

satish,babu,,it

kumar,nagendra,200,eie

anil,babu,,it

then apply query like this(i ran it worked) :

hive> from students s join (select avg(score)as avg from students) a
> select s.firstname,
> case
> when s.score="" or s.score=NULL then  cast(avg AS string)
> else s.score
> end as new_score ;

Total MapReduce jobs = 2

output:

OK

firstname       new_score

vijay 123

satish 161.5

kumar 200

anil 161.5

Time taken: 67.059 seconds, Fetched: 4 row(s)

Answer 2

According to my view,

Its better to write a mapreduce code only.(use job1 to find mean, then map only job2 to replace which is easy). combination of hive with MR will be a bit messy(reason for this is you are going to write both in one code, have to ship it to cluster nodes a jar, we cant say where these tasks will run, i mean where hive command execution point ll be).

hope this helps. Thanks :)

Combination of MapReduce and Hive

Question

2 answers

solution1
2 ACCPTED 2014-01-17 06:56:40

Total MapReduce jobs = 2

solution2
0 2014-01-17 04:58:34

Combination of MapReduce and Hive

Question

2 answers

solution1 2 ACCPTED 2014-01-17 06:56:40

Total MapReduce jobs = 2

solution2 0 2014-01-17 04:58:34

solution1
2 ACCPTED 2014-01-17 06:56:40

solution2
0 2014-01-17 04:58:34