Calculated Column in SQL Server

Question

I have my data in table as:

id  Author_ID   Research_Area       Category_ID  Paper_Count   Paper_Year   Rank  
---------------------------------------------------------------------------------
1   677         feature extraction  8            1             2005         1
2   677         image annotation    11           1             2005         2
3   677         probabilistic model 12           1             2005         3
4   677         semantic            19           1             2007         1
5   677         feature extraction  8            1             2009         1
6   677         image annotation    11           1             2011         1  
7   677         semantic            19           1             2012         1  
8   677         video sequence      5            2             2013         1  
9   1359        adversary model     1            2             2005         1
10  1359        ensemble method     14           2             2005         2
11  1359        image represent     11           2             2005         3
12  1359        adversary model     1            7             2006         1
13  1359        concurrency control 17           5             2006         2
14  1359        information system  12           2             2006         3  
15  ...         
16  ...

Whereas I want to have an output of query as:

id  Author_ID   Category_ID  Paper_Count   Category_Prob   Paper_Year   Rank  
---------------------------------------------------------------------------------
1   677         8            1             0.333           2005         1
2   677         11           1             0.333           2005         2
3   677         12           1             0.333           2005         3
4   677         19           1             1.0             2007         1
5   677         8            1             1.0             2009         1
6   677         11           1             1.0             2011         1  
7   677         19           1             1.0             2012         1  
8   677         5            2             1.0             2013         1  
9   1359        1            2             0.333           2005         1
10  1359        14           2             0.333           2005         2
11  1359        11           2             0.333           2005         3
12  1359        1            7             0.5             2006         1
13  1359        17           5             0.357           2006         2
14  1359        12           2             0.142           2006         3  
15  ...         
16  ...

Whereas Category_Prob is a calculated column which is calculated in two steps as:

Step First , we have to have a SUM of Paper_Count in each Paper_Year for instance ie Paper_Year = 2005 and Author_ID = 677 , the SUM(Paper_Count) = 3

Step Second , then for each Category_ID , we have to divide Paper_Count with value of SUM(Paper_Count) in that Paper_Year which will be 1/3 ie 0.333 and so on...

Moreover, I have tried this query:

SELECT 
    Author_ID, Abstract_Category, Paper_Count,
    [Category_Prob] = Paper_Count / SUM(Paper_Count),
    Paper_Year, Rank
FROM 
    Author_Areas
GROUP BY 
    Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY 
    Author_ID, Paper_Year

But it returns just 1 in the column Category_Prob for all of the rows in the table.

Answer 1

The problem with your query is that you are not grouping by Paper_Year , but also by Author_ID, Abstract_Category, Paper_Count, Rank . Hence SUM(Paper_Count) is equal to Paper_Count for each group.

You can use SUM OVER for this:

SELECT      id, Author_ID, Abstract_Category [Category_ID],  
            Paper_Count, 
            Paper_Count * 1.0 / SUM(Paper_Count)  
            OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
            Paper_Year, Rank
FROM        Author_Areas
ORDER BY    Author_ID, Paper_Year

Note: You have to multiply by 1.0 so as to avoid integer division. Note 2: Perhaps you have to add Author_ID field in the PARTITION BY clause as well, if your actual requirement is to group by author, year.

Answer 2

I suspect (please confirm) that the datatype of all fields involved are integers . When you calculate with int the return type is also int . You should convert the fields to decimal before calculation.

SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year

Calculated Column in SQL Server

Question

2 answers

solution1
6 ACCPTED 2016-04-18 08:20:52

solution2
0 2016-04-18 08:19:55

Calculated Column in SQL Server

Question

2 answers

solution1 6 ACCPTED 2016-04-18 08:20:52

solution2 0 2016-04-18 08:19:55

solution1
6 ACCPTED 2016-04-18 08:20:52

solution2
0 2016-04-18 08:19:55