I would like to clean my data by removing all of a certain column that is only listed once or twice. It currently looks like this:
Fruit | Year | Units |
---|---|---|
apples | 2018 | 20000 |
oranges | 2018 | 600 |
apples/oranges | 2018 | 3000 |
oranges | 2017 | 6000 |
apples | 2016 | 2000 |
oranges | 2016 | 2000 |
apples | 2017 | 50000 |
potato | 2017 | 9000 |
apples/oranges | 2016 | 5000 |
I would like it to look like this:
Fruit | Year | Units |
---|---|---|
apples | 2018 | 20000 |
oranges | 2018 | 600 |
apples | 2017 | 50000 |
oranges | 2017 | 6000 |
apples | 2016 | 2000 |
oranges | 2016 | 2000 |
There are a lot more Fruit single entries than this in the table in reality so I can not just exclude using a long where
statement.
Attempted solution
I've tried to simplify the data by using a subquery that counts the number of times a "Fruit" entry appears, then only displays rows where this is two or more. It works as a standalone query but not in the larger query which also includes the other columns.
SELECT "Fruit"
,count("Fruit") as cnt
,"Year"
,"Units"
FROM example_table
WHERE(SELECT count("Fruit") as cnt
FROM example_table
HAVING cnt > 2)
GROUP BY "Fruit"
,"Year"
,"Units"
This is the error message I get:
Invalid data type [NUMBER(18,0)] for predicate [(SELECT COUNT(EXAMPLE_TABLE."Fruit") AS "CNT" FROM EXAMPLE_TABLE AS EXAMPLE_TABLE HAVING CNT > 2)]
One way of doing it is getting the fruit names that have more than 2 then you can select them.
SELECT *
FROM example_table
WHERE Fruit in (
SELECT Fruit
FROM example_table
group by Fruit
having count(Fruit) > 2)
;
FUNCTIONS USED;
WITH CTE AS
(SELECT 'apples' FRUITS, 2018 YEAR, 20000 UNITS
UNION ALL SELECT 'oranges', 2018 YEAR, 600 UNITS
UNION ALL SELECT 'oranges', 2017 YEAR, 6000 UNITS
UNION ALL SELECT 'apples', 2016 YEAR, 2000 UNITS
UNION ALL SELECT 'oranges', 2016 YEAR, 2000 UNITS
UNION ALL SELECT 'apples', 2017 YEAR, 50000 UNITS
UNION ALL SELECT 'potato', 2017 YEAR, 9000 UNITS
UNION ALL SELECT 'apples/oranges' , 2016, 5000
UNION ALL SELECT 'apples/oranges', 2018, 3000 )
SELECT * FROM CTE
QUALIFY COUNT(DISTINCT YEAR)OVER(PARTITION BY FRUITS)>2;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.