I wish to port some R code to Hadoop to be used with Impala or Hive with a SQL-like query. The code I have is based on this question:
R data table: compare row value to group values, with condition
I wish to find, for each row, the number of rows with the same id in subgroup 1 with cheaper price.
Let's say I have the following data:
CREATE TABLE project
(
id int,
price int,
subgroup int
);
INSERT INTO project(id,price,subgroup)
VALUES
(1, 10, 1),
(1, 10, 1),
(1, 12, 1),
(1, 15, 1),
(1, 8, 2),
(1, 11, 2),
(2, 9, 1),
(2, 12, 1),
(2, 14, 2),
(2, 18, 2);
Here is the output I would like to have (with the new column cheaper ):
id price subgroup cheaper
1 10 1 0 ( because no row is cheaper in id 1 subgroup 1)
1 10 1 0 ( because no row is cheaper in id 1 subgroup 1)
1 12 1 2 ( rows 1 and 2 are cheaper)
1 15 1 3
1 8 2 0 (nobody is cheaper in id 1 and subgroup 1)
1 11 2 2
2 9 1 0
2 12 1 1
2 14 2 2
2 18 2 2
Note that I always want to compare rows to the ones in subgroup 1, even when the rows are themselves in subgroup 2.
You can join the table with itself, using a LEFT JOIN:
SELECT
p.id,
p.price,
p.subgroup,
COUNT(p2.id)
FROM
project p LEFT JOIN project p2
ON p.id=p2.id AND p2.subgroup=1 AND p.price>p2.price
GROUP BY
p.id,
p.price,
p.subgroup
ORDER BY
p.id, p.subgroup
count(p2.id) will count all rows where the join does succeed (and it succeeds where there are cheaper prices for the same id and for the subgroup 1).
The only problem is that you are expecting those two rows:
1 10 1 0
1 10 1 0
but my query will only return one, because I'm grouping by id, price, and subgroup. If you have another unique ID in your project table you could also group by that ID. Please see a fiddle here .
Or you could use an inline query:
SELECT
p.id,
p.price,
p.subgroup,
(SELECT COUNT(*)
FROM project p2
WHERE p2.id=p.id AND p2.subgroup=1 AND p2.price<p.price) AS n
FROM
project p
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.