简体   繁体   中英

How to get % age of similar text of a column in SQL Server table

I have a column named research_area in a SQL Server table like this

digital library
approximation algorithm
real time application
approximation algorithm
applied mathematics
image processing
applied mathematics
evolutionary computation
image processing
image processing
image processing
image annotation
image segmentation
natural language processing
image processing
image segmentation
anomaly detection
image annotation
efficient algorithm
time series analysis
image annotation
image annotation
image processing
routing wireless networks
constrained project scheduling
image annotation
image segmentation
differential equation
image processing
collaborative filtering
image segmentation
image annotation
efficient algorithm
data reduction
image segmentation
image annotation
image processing
applied mathematics
image segmentation
image segmentation

Now I want to have some sort of processing that I'm able to get something like this ie

image processing    8
image annotation    7
image segmentation  7
applied mathematics 3
approximation algorithm 2
efficient algorithm 2
digital library 1
real time application   1
evolutionary computation    1
natural language processing 1
anomaly detection   1
time series analysis    1
routing wireless networks   1
constrained project scheduling  1
differential equation   1
collaborative filtering 1
data reduction  1

So now how can I get this by adding columns or whatever else?

This is what I have tried:

SELECT 
    aid, research_area as [Name], COUNT(research_area) as [Count] 
FROM
    sub_aminer_paper 
GROUP BY 
    research_area 
WHERE
    aid = 1653869

But it gives an error:

The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.

You must CAST your column to varchar or nvarchar to use GROUP BY clause on it:

SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count] 
FROM sub_aminer_paper 
GROUP BY  CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869

SQL Server Error Messages - Msg 306

Wait, clarifying your question here. You want to get the weight of each "IDENTICAL" value in a given column, compared to the total number of rows within the named column?

As in, you want to know if there are 100 columns, and 8 are of an identical name the digit 8 should represent that 8% of the given rows are named "whatever" and so on?

you can use this,

select * from (SELECT   
research_area,aid,count(*) AS SumOfValues,
(100.0 * (count(*)) / (SUM(count(*)) OVER())) AS percnt
FROM    table
GROUP BY research_area,aid) b where aid=1653869;

EDIT: Gives you count and percent of each value.

After modifying the answer given by @Shaharyar, so this is the answer that worked fine

SELECT aid, CAST(research_area as VARCHAR(100)) [research_area], COUNT(research_area) [Count] 
FROM sub_aminer_paper 
GROUP BY  CAST(research_area as VARCHAR(100)), aid
WHERE aid = 1653869  

and this is the output what required.
Thanks Shaharyar 在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM