简体   繁体   中英

Inserting data into a table with a case expression

In the Zip Code Document I received there was supposed to be DMAs (Direct Marketing Areas) for each of the Zip Codes; however, some of the values for the DMA are ''. In order to fix this, I am supposed to take the most common DMA for the Zip Code's County and put that in for the Zip Code's DMA.

Currently I have figured out how to determine the maximum number of occurrences for a DMA in each county. For instance, I know that in Abbeville County the most frequent DMA shows up 5 times, and for Acadia, it is 10 times. This data is stored inside of the temporary table #Temp2 that was created using the following code:

INSERT INTO #Temp
    SELECT ROW_NUMBER() OVER(PARTITION BY County, DMA ORDER BY County DESC) AS Num, County, DMA
    FROM [HPW Data].[dbo].[Zip_Codes_All]
    WHERE DMA <> '<NULL>'
INSERT INTO #Temp2 
    SELECT DISTINCT MAX(Num), County
    FROM #Temp 
    GROUP BY County

I achieved this by finding the max row number after partitioning the zip code table into segments containing County, DMA, and Num (which is the number of occurrences for any County, DMA combination)

Afterwards, I wrote this code in my attempt to replace the '' values in my Zip Code table to their County's most frequent DMA

INSERT INTO [HPW Data].[dbo].[Zip_Codes_All]
    SELECT Zip_Code, c.County, 
        CASE c.DMA
            WHEN '<NULL>' THEN (SELECT d.DMA WHERE c.County = d.County)
            ELSE c.DMA END AS DMA
        FROM [HPW Data].[dbo].[Zip_Codes_All] AS c,
        (SELECT a.County, DMA FROM #Temp AS a, #Temp2 AS b WHERE a.Num = b.Num AND a.County = b.County) AS d

I think part of the reason it is not working as expected is because some DMAs tie as the most occurring DMA for the Counties (EX: Adair County has three DMAs that show up five times and the most any DMA shows up is five times).

I had a quick try at this, and I think part of your problem is the 1990s JOIN syntax?

WITH ZipCodeDMAs AS (
    SELECT 
        County, 
        DMA,
        COUNT(*) AS freq
    FROM 
        [HPW Data].dbo.Zip_Codes_All
    WHERE 
        DMA != '<NULL>'
    GROUP BY
        County,
        DMA),
MaxDMA AS (
    SELECT
        County,
        DMA,
        ROW_NUMBER() OVER (PARTITION BY County ORDER BY freq DESC) AS order_id
    FROM
        ZipCodeDMAs)
INSERT INTO 
    [HPW Data].dbo.Zip_Codes_All
SELECT 
    Zip_Code, 
    c.County, 
    ISNULL(c.DMA, m.DMA) AS DMA
FROM 
    [HPW Data].dbo.Zip_Codes_All c
    INNER JOIN MaxDMA m ON m.County = c.County AND m.order_id = 1;

This uses two CTEs to: - get the frequency of County-DMA tuples; - determine the most frequent DMA per County (allowing for ties - picks one at "random").

Then it's simply a case of swapping in the most frequent DMA in cases where we don't have one in the data. This assumes that your data won't ever have "new" Countys that we never had a DMA for before, as otherwise the INNER JOINs will break the query.

I didn't really follow your example, you seemed to be using ROW_NUMBER() to come up with some sub-optimal way of counting the frequencies, then running with this throughout the remainder of your code. Also SELECT * FROM a, b, c went out of fashion around 20 years ago!!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM