简体   繁体   English

使用case表达式将数据插入表中

[英]Inserting data into a table with a case expression

In the Zip Code Document I received there was supposed to be DMAs (Direct Marketing Areas) for each of the Zip Codes; 在我收到的邮政编码文件中,应该为每个邮政编码提供DMA(直销区域)。 however, some of the values for the DMA are ''. 但是,DMA的某些值为“”。 In order to fix this, I am supposed to take the most common DMA for the Zip Code's County and put that in for the Zip Code's DMA. 为了解决此问题,我应该将最常见的DMA用于邮编所在的县,并将其放入邮编的DMA中。

Currently I have figured out how to determine the maximum number of occurrences for a DMA in each county. 目前,我已经弄清楚了如何确定每个县中DMA的最大出现次数。 For instance, I know that in Abbeville County the most frequent DMA shows up 5 times, and for Acadia, it is 10 times. 例如,我知道在阿贝维尔县,最频繁的DMA出现了5次,而对于阿卡迪亚,则是10倍。 This data is stored inside of the temporary table #Temp2 that was created using the following code: 此数据存储在使用以下代码创建的临时表#Temp2中:

INSERT INTO #Temp
    SELECT ROW_NUMBER() OVER(PARTITION BY County, DMA ORDER BY County DESC) AS Num, County, DMA
    FROM [HPW Data].[dbo].[Zip_Codes_All]
    WHERE DMA <> '<NULL>'
INSERT INTO #Temp2 
    SELECT DISTINCT MAX(Num), County
    FROM #Temp 
    GROUP BY County

I achieved this by finding the max row number after partitioning the zip code table into segments containing County, DMA, and Num (which is the number of occurrences for any County, DMA combination) 在将邮政编码表划分为包含County,DMA和Num的段(这是任何County,DMA组合的出现次数)后,我通过找到最大行数来实现这一点

Afterwards, I wrote this code in my attempt to replace the '' values in my Zip Code table to their County's most frequent DMA 之后,我编写了这段代码,试图将邮政编码表中的''值替换为该县最频繁的DMA

INSERT INTO [HPW Data].[dbo].[Zip_Codes_All]
    SELECT Zip_Code, c.County, 
        CASE c.DMA
            WHEN '<NULL>' THEN (SELECT d.DMA WHERE c.County = d.County)
            ELSE c.DMA END AS DMA
        FROM [HPW Data].[dbo].[Zip_Codes_All] AS c,
        (SELECT a.County, DMA FROM #Temp AS a, #Temp2 AS b WHERE a.Num = b.Num AND a.County = b.County) AS d

I think part of the reason it is not working as expected is because some DMAs tie as the most occurring DMA for the Counties (EX: Adair County has three DMAs that show up five times and the most any DMA shows up is five times). 我认为它不能按预期工作的部分原因是因为某些DMA作为县中最常出现的DMA并列(例如:Adair County的三个DMA出现了五次,而最多的DMA出现了五次)。

I had a quick try at this, and I think part of your problem is the 1990s JOIN syntax? 我对此进行了快速尝试,我认为您的问题的一部分是1990年代的JOIN语法?

WITH ZipCodeDMAs AS (
    SELECT 
        County, 
        DMA,
        COUNT(*) AS freq
    FROM 
        [HPW Data].dbo.Zip_Codes_All
    WHERE 
        DMA != '<NULL>'
    GROUP BY
        County,
        DMA),
MaxDMA AS (
    SELECT
        County,
        DMA,
        ROW_NUMBER() OVER (PARTITION BY County ORDER BY freq DESC) AS order_id
    FROM
        ZipCodeDMAs)
INSERT INTO 
    [HPW Data].dbo.Zip_Codes_All
SELECT 
    Zip_Code, 
    c.County, 
    ISNULL(c.DMA, m.DMA) AS DMA
FROM 
    [HPW Data].dbo.Zip_Codes_All c
    INNER JOIN MaxDMA m ON m.County = c.County AND m.order_id = 1;

This uses two CTEs to: - get the frequency of County-DMA tuples; 这使用两个CTE来:-获取County-DMA元组的频率; - determine the most frequent DMA per County (allowing for ties - picks one at "random"). -确定每个县最频繁的DMA(允许联系-在“随机”中选择一个)。

Then it's simply a case of swapping in the most frequent DMA in cases where we don't have one in the data. 这只是在我们的数据中没有一个的情况下,在最频繁的DMA中进行交换的情况。 This assumes that your data won't ever have "new" Countys that we never had a DMA for before, as otherwise the INNER JOINs will break the query. 这假定您的数据将永远不会有我们以前从未拥有过DMA的“新”县,否则INNER JOINs将中断查询。

I didn't really follow your example, you seemed to be using ROW_NUMBER() to come up with some sub-optimal way of counting the frequencies, then running with this throughout the remainder of your code. 我并没有真正遵循您的示例,您似乎正在使用ROW_NUMBER()提出一些次优的计数频率方法,然后在其余的代码中使用该方法。 Also SELECT * FROM a, b, c went out of fashion around 20 years ago!! SELECT * FROM a, b, c大约在20年前就过时了!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM