I have a table (Oracle SQL) containing details of a list of item prices at each store location. I want to combine several rows into one -- but only when ALL rows for an item meet the criteria: the item is the same price at all locations.
The data.table (simplified) looks like this:
list_id, item_id, location_id, item_price
1 1 1 1.99
1 1 2 1.99
1 1 3 1.99
1 2 1 3.99
1 2 2 3.99
1 2 3 3.99
1 3 1 5.99
1 3 2 7.99
1 3 3 8.99
...and I want this:
list_id, item_id, location_id, item_price
1 1 0 1.99
1 2 0 3.99
1 3 1 5.99
1 3 2 7.99
1 3 3 8.99
Rows for items 1 and 2 have been combined into a single row each, with location set to zero(all). Rows for item 3 have remained unchanged because the price was not the same in ALL locations.
This query helps me to identify when an item doesn't need to be merged (two rows exist with the same item_id):
select count(list_id), item_id, item_price
from list_detail
group by item_id, item_price
...but I can't wrap my head around how it would fit into a larger trigger, script, or whatever which would identify and combine rows.
NOTE: I cannot change the structure of the table because it is relied on by many, many other processes.
How would you best identify and then combine rows where the price is the same in all locations? A script, trigger, scheduled console app?
One option uses window functions, then distinct
:
select distinct list_id, item_id, location_id, item_price
from (
select list_id, item_id, item_price,
case when min(item_price) over(partition by list_id, item_id) = max(item_price) over(partition by list_id, item_id)
then 0
else location_id
end location_id
from mytable t
) t
The basic idea is to compare the minimum and the maximum price in groups having the same list_id
and item_id
. When they are equal, then we know we have just one distinct value in the group, so we turn the location_id
to 0
, else we keep it as it is. All that is left to do is then to keep distinct values.
Since you must update some rows and delete others in a single statement, it's best to use a merge
statement, which is exactly for this purpose.
The s
(ource) rowset is the result of an aggregation - to identify the (list_id, item_id)
that must be modified.
Note that I assume the price is never null
; if it can be null
, you must explain how that should be handled.
There will be solutions offered using analytic functions. If efficiency (speed) is important, the solution below will be better; aggregation is much faster than analytic functions, when both do the same job.
merge into sample_data t
using (
select list_id, item_id, min(location_id) as min_loc_id
from sample_data
group by list_id, item_id
having min(item_price) = max(item_price)
) s
on (t.list_id = s.list_id and t.item_id = s.item_id)
when matched then
update
set t.location_id = case when t.location_id = s.min_loc_id then 0 end
delete
where t.location_id is null
;
Rows from the target (which is your base table) will only be affected when they match the source by list_id, item_id
; other rows will be left unchanged. (These unchanged rows are the rows for items where the price is not the same at all locations - so the corresponding list_id, item_id
does not appear in the source.)
The update
part will change the first location id to 0 and all the others to null
. Then the delete
part will delete all the rows where the location id is null
. In this step, the location id is the modified one, after the update
part did its work. So all the rows except one, for affected location_id, item_id
, will be deleted by the delete
step.
Hmmm. . . .You can use window functions:
select list_id, item_id,
(case when min_price = max_price then 0
else location_id
end) as location_id,
price
from (select t.*,
min(price) over (partition by list_id, item_id) as min_price,
max(price) over (partition by list_id, item_id) as max_price
from t
) t
group by list_id, item_id,
(case when min_price = max_price then 0
else location_id
end), price;
Another method would use exists
and union all
:
select list_id, item_id, location_id, price
from t
where exists (select 1
from t t2
where t2.list_id = t.list_id and
t2.item_id = t.item_id and
t2.price <> t.price
)
union all
select list_id, item_id, 0, max(price)
from t
group by list_id, item_id
having min(price) = max(price);
You can use a MERGE
statement to both UPDATE
and DELETE
and, if you use analytic functions to identify the affected rows then you can correlate the merge using the ROWID
pseudo-column which can perform a self-join (more efficiently than by comparing values):
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
rn
FROM (
SELECT ROW_NUMBER() OVER ( PARTITION BY list_id, item_id ORDER BY location_id )
AS rn,
MIN( item_price ) OVER ( PARTITION BY list_id, item_id ) AS min_price,
MAX( item_price ) OVER ( PARTITION BY list_id, item_id ) AS max_price
FROM table_name
)
WHERE min_price = max_price
) src
ON ( src.rid = dst.ROWID )
WHEN MATCHED THEN
UPDATE SET location_id = 0
DELETE WHERE src.rn > 1;
Which, for the sample data:
CREATE TABLE table_name ( list_id, item_id, location_id, item_price ) AS
SELECT 1, 1, 1, 1.99 FROM DUAL UNION ALL
SELECT 1, 1, 2, 1.99 FROM DUAL UNION ALL
SELECT 1, 1, 3, 1.99 FROM DUAL UNION ALL
SELECT 1, 2, 1, 3.99 FROM DUAL UNION ALL
SELECT 1, 2, 2, 3.99 FROM DUAL UNION ALL
SELECT 1, 2, 3, 3.99 FROM DUAL UNION ALL
SELECT 1, 3, 1, 5.99 FROM DUAL UNION ALL
SELECT 1, 3, 2, 7.99 FROM DUAL UNION ALL
SELECT 1, 3, 3, 8.99 FROM DUAL;
Updates 2 rows and deletes 4 rows leaving the table as:
LIST_ID | ITEM_ID | LOCATION_ID | ITEM_PRICE ------: | ------: | ----------: | ---------: 1 | 1 | 0 | 1.99 1 | 2 | 0 | 3.99 1 | 3 | 1 | 5.99 1 | 3 | 2 | 7.99 1 | 3 | 3 | 8.99
db<>fiddle here
If you can have NULL
values for item_price
then the query can be extended to only filter when rows are all non- NULL
or are all NULL
:
MERGE INTO table_name dst
USING (
SELECT ROWID AS rid,
rn
FROM (
SELECT ROW_NUMBER() OVER ( PARTITION BY list_id, item_id ORDER BY location_id )
AS rn,
MIN( item_price ) OVER ( PARTITION BY list_id, item_id ) AS min_price,
MAX( item_price ) OVER ( PARTITION BY list_id, item_id ) AS max_price,
COUNT(item_price)
OVER ( PARTITION BY list_id, item_id ) AS num_non_null,
COUNT(*)
OVER ( PARTITION BY list_id, item_id ) AS num_locations
FROM table_name
)
WHERE ( min_price = max_price AND num_non_null = num_locations )
OR ( num_non_null = 0 )
) src
ON ( src.rid = dst.ROWID )
WHEN MATCHED THEN
UPDATE SET location_id = 0
DELETE WHERE src.rn > 1;
Which, for the sample data:
CREATE TABLE table_name ( list_id, item_id, location_id, item_price ) AS
SELECT 1, 1, 1, 1.99 FROM DUAL UNION ALL
SELECT 1, 1, 2, 1.99 FROM DUAL UNION ALL
SELECT 1, 1, 3, 1.99 FROM DUAL UNION ALL
SELECT 1, 2, 2, 3.99 FROM DUAL UNION ALL
SELECT 1, 2, 3, 3.99 FROM DUAL UNION ALL
SELECT 1, 2, 4, 3.99 FROM DUAL UNION ALL
SELECT 1, 3, 1, 5.99 FROM DUAL UNION ALL
SELECT 1, 3, 2, 7.99 FROM DUAL UNION ALL
SELECT 1, 3, 3, 8.99 FROM DUAL UNION ALL
SELECT 1, 4, 8, 1.99 FROM DUAL UNION ALL
SELECT 1, 4, 9, NULL FROM DUAL UNION ALL
SELECT 1, 5, 1, NULL FROM DUAL UNION ALL
SELECT 1, 5, 2, NULL FROM DUAL;
Outputs:
LIST_ID | ITEM_ID | LOCATION_ID | ITEM_PRICE ------: | ------: | ----------: | ---------: 1 | 1 | 0 | 1.99 1 | 2 | 0 | 3.99 1 | 3 | 1 | 5.99 1 | 3 | 2 | 7.99 1 | 3 | 3 | 8.99 1 | 4 | 8 | 1.99 1 | 4 | 9 | null 1 | 5 | 0 | null
db<>fiddle here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.