In the Zip Code Document I received there was supposed to be DMAs (Direct Marketing Areas) for each of the Zip Codes; however, some of the values for the DMA are ''. In order to fix this, I am supposed to take the most common DMA for the Zip Code's County and put that in for the Zip Code's DMA.
Currently I have figured out how to determine the maximum number of occurrences for a DMA in each county. For instance, I know that in Abbeville County the most frequent DMA shows up 5 times, and for Acadia, it is 10 times. This data is stored inside of the temporary table #Temp2 that was created using the following code:
INSERT INTO #Temp
SELECT ROW_NUMBER() OVER(PARTITION BY County, DMA ORDER BY County DESC) AS Num, County, DMA
FROM [HPW Data].[dbo].[Zip_Codes_All]
WHERE DMA <> '<NULL>'
INSERT INTO #Temp2
SELECT DISTINCT MAX(Num), County
FROM #Temp
GROUP BY County
I achieved this by finding the max row number after partitioning the zip code table into segments containing County, DMA, and Num (which is the number of occurrences for any County, DMA combination)
Afterwards, I wrote this code in my attempt to replace the '' values in my Zip Code table to their County's most frequent DMA
INSERT INTO [HPW Data].[dbo].[Zip_Codes_All]
SELECT Zip_Code, c.County,
CASE c.DMA
WHEN '<NULL>' THEN (SELECT d.DMA WHERE c.County = d.County)
ELSE c.DMA END AS DMA
FROM [HPW Data].[dbo].[Zip_Codes_All] AS c,
(SELECT a.County, DMA FROM #Temp AS a, #Temp2 AS b WHERE a.Num = b.Num AND a.County = b.County) AS d
I think part of the reason it is not working as expected is because some DMAs tie as the most occurring DMA for the Counties (EX: Adair County has three DMAs that show up five times and the most any DMA shows up is five times).
I had a quick try at this, and I think part of your problem is the 1990s JOIN
syntax?
WITH ZipCodeDMAs AS (
SELECT
County,
DMA,
COUNT(*) AS freq
FROM
[HPW Data].dbo.Zip_Codes_All
WHERE
DMA != '<NULL>'
GROUP BY
County,
DMA),
MaxDMA AS (
SELECT
County,
DMA,
ROW_NUMBER() OVER (PARTITION BY County ORDER BY freq DESC) AS order_id
FROM
ZipCodeDMAs)
INSERT INTO
[HPW Data].dbo.Zip_Codes_All
SELECT
Zip_Code,
c.County,
ISNULL(c.DMA, m.DMA) AS DMA
FROM
[HPW Data].dbo.Zip_Codes_All c
INNER JOIN MaxDMA m ON m.County = c.County AND m.order_id = 1;
This uses two CTEs to: - get the frequency of County-DMA tuples; - determine the most frequent DMA per County (allowing for ties - picks one at "random").
Then it's simply a case of swapping in the most frequent DMA in cases where we don't have one in the data. This assumes that your data won't ever have "new" Countys that we never had a DMA for before, as otherwise the INNER JOINs
will break the query.
I didn't really follow your example, you seemed to be using ROW_NUMBER()
to come up with some sub-optimal way of counting the frequencies, then running with this throughout the remainder of your code. Also SELECT * FROM a, b, c
went out of fashion around 20 years ago!!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.