I have my data in table as:
id Author_ID Research_Area Category_ID Paper_Count Paper_Year Rank
---------------------------------------------------------------------------------
1 677 feature extraction 8 1 2005 1
2 677 image annotation 11 1 2005 2
3 677 probabilistic model 12 1 2005 3
4 677 semantic 19 1 2007 1
5 677 feature extraction 8 1 2009 1
6 677 image annotation 11 1 2011 1
7 677 semantic 19 1 2012 1
8 677 video sequence 5 2 2013 1
9 1359 adversary model 1 2 2005 1
10 1359 ensemble method 14 2 2005 2
11 1359 image represent 11 2 2005 3
12 1359 adversary model 1 7 2006 1
13 1359 concurrency control 17 5 2006 2
14 1359 information system 12 2 2006 3
15 ...
16 ...
Whereas I want to have an output of query as:
id Author_ID Category_ID Paper_Count Category_Prob Paper_Year Rank
---------------------------------------------------------------------------------
1 677 8 1 0.333 2005 1
2 677 11 1 0.333 2005 2
3 677 12 1 0.333 2005 3
4 677 19 1 1.0 2007 1
5 677 8 1 1.0 2009 1
6 677 11 1 1.0 2011 1
7 677 19 1 1.0 2012 1
8 677 5 2 1.0 2013 1
9 1359 1 2 0.333 2005 1
10 1359 14 2 0.333 2005 2
11 1359 11 2 0.333 2005 3
12 1359 1 7 0.5 2006 1
13 1359 17 5 0.357 2006 2
14 1359 12 2 0.142 2006 3
15 ...
16 ...
Whereas Category_Prob
is a calculated column which is calculated in two steps as:
Step First , we have to have a SUM
of Paper_Count
in each Paper_Year
for instance ie Paper_Year = 2005
and Author_ID = 677
, the SUM(Paper_Count) = 3
Step Second , then for each Category_ID
, we have to divide Paper_Count
with value of SUM(Paper_Count)
in that Paper_Year
which will be 1/3
ie 0.333
and so on...
Moreover, I have tried this query:
SELECT
Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = Paper_Count / SUM(Paper_Count),
Paper_Year, Rank
FROM
Author_Areas
GROUP BY
Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY
Author_ID, Paper_Year
But it returns just 1
in the column Category_Prob
for all of the rows in the table.
The problem with your query is that you are not grouping by Paper_Year
, but also by Author_ID, Abstract_Category, Paper_Count, Rank
. Hence SUM(Paper_Count)
is equal to Paper_Count for each group.
You can use SUM OVER
for this:
SELECT id, Author_ID, Abstract_Category [Category_ID],
Paper_Count,
Paper_Count * 1.0 / SUM(Paper_Count)
OVER (PARTITION BY Author_ID, Paper_Year) AS [Category_Prob],
Paper_Year, Rank
FROM Author_Areas
ORDER BY Author_ID, Paper_Year
Note: You have to multiply by 1.0
so as to avoid integer division. Note 2: Perhaps you have to add Author_ID
field in the PARTITION BY
clause as well, if your actual requirement is to group by author, year.
I suspect (please confirm) that the datatype of all fields involved are integers
. When you calculate with int
the return type is also int
. You should convert
the fields to decimal
before calculation.
SELECT Author_ID, Abstract_Category, Paper_Count,
[Category_Prob] = convert(decimal(10,3), Paper_Count) / convert(decimal(10, 3), SUM(Paper_Count)),
Paper_Year, Rank
FROM Author_Areas
GROUP BY Author_ID, Abstract_Category, Paper_Year, Paper_Count, Rank
ORDER BY Author_ID, Paper_Year
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.