简体   繁体   中英

Finding the most frequent value in sql server 2012

I want to find most frequent product each customer has purchased. my data set is like this :

CustomerID     ProdID    FavouriteProduct
    1              A              ?
    1              A              ?
    1              A              ?
    1              B              ?
    1              A              ?
    1              A              ?
    1              A              ?
    1              B              ?
    2              A              ?
    2              AN             ?
    2              G              ?
    2              C              ?
    2              C              ?
    2              F              ?
    2              D              ?
    2              C              ?

There are so many products,So i cannot put them in a pivot table.

Answer would look like this :

CustomerID     ProdID    FavouriteProduct
    1              A              A
    1              A              A
    1              A              A
    1              B              A
    1              A              A
    1              A              A
    1              A              A
    1              B              A
    2              A              C
    2              AN             C
    2              G              C
    2              C              C
    2              C              C
    2              F              C
    2              D              C
    2              C              C

The query may look like this:

Update table
set FavouriteProduct = (Select 
                            CustomerID, Product, Max(Count(Product)) 
                        From Table 
                        group  by CustomerID, Product) FP    

Another way to get the most frequent product is to use row_number() :

select customerid, productid,
       max(case when seqnum = 1 then productid end) over (partition by customerid) as favoriteproductid
from (select customerid, productid, count(*) as cnt,
             row_number() over (partition by customerid order by count(*) desc) as seqnum
      from customer c
      group by customerid, productid
     ) cp;

To return the rows exactly as you have described in the question, you could try using a table expression (I used a CTE in my example) to first return a popularity ranking, where the higher the number, the more popular the product for each customer.

WITH RankTable AS (
  SELECT
    CustomerID, ProductID, COUNT(*) AS Popularity
  FROM TableA
  GROUP BY CustomerID, ProductID
)

Then the full results table can be returned by first performing an inner join on the original table (TableA) and the table expression (RankTable), and then using a window function to create the values in the FavoriteProduct column.

SELECT 
    P.CustomerID
  , P.ProductID
  , FIRST_VALUE(P.ProductID) OVER(
      PARTITION BY R.CustomerID
      ORDER BY R.Popularity DESC, R.ProductID) AS FavoriteProduct
FROM TableA AS P
  INNER JOIN RankTable AS R
    ON P.CustomerID = R.CustomerID
    AND P.ProductID= R.ProductID;

Thanks to Nick, i found a way to find the most frequent value. i share with you how it works :

   Select CustomerID,ProductID,Count(*) as Number 
   from table A 
   group by CustomerID,ProductID 
   having Count(*)>= (Select Max(Number) from (Select CustomerID,ProductID,Count(*) as Number from table B where B.CustomerID= A.CustomerID  group by CustomerID,Product)C) 

Just in case your SQL doesn't perform fast enough and you have customers also in a smaller table, this might work better::

select C.CustomerId, R.ProductID
from Customer C
outer apply (
  Select top 1 ProductID,Count(*) as Number 
  from table A 
  where A.CustomerId = C.CustomerId
  group by ProductId
  order by Number desc 
) R

This one, based on the example at the end of this page: http://www.sql-server-performance.com/2006/find-frequent-values/ might be quicker:

SELECT CustomerID, ProdID, Cnt 
FROM 
(
    SELECT CustomerID, ProdID, COUNT(*) as Cnt, 
    RANK() OVER (
       PARTITION BY CustomerID
       ORDER BY COUNT(*) DESC
    ) AS Rnk 
    FROM YourTransactionTable
    GROUP BY CustomerID, ProdID
) x 
WHERE Rnk = 1

This one uses a RANK() function. In this case you don't have to join back to the same table (meaning there's a lot less work required)

Now to update your existing data, I like to wrap my dataset in a WITH to make debugging a little easier and the final update a bit simpler:

;WITH
(
  SELECT CustomerID, ProdID, Cnt 
  FROM 
  (
     SELECT CustomerID, ProdID, COUNT(*) as Cnt, 
     RANK() OVER (PARTITION BY CustomerID
     ORDER BY COUNT(*) DESC) AS Rnk 
     FROM TransactionTable
     GROUP BY CustomerID, ProdID
  ) x 
  WHERE Rnk = 1
) As SRC

 UPDATE FavouriteTable
 SET Favourite = SRC.ProdID
 FROM SRC
 WHERE SRC.CustomerID = Favourite.CustomerID

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM