[英]Finding correlations in SQL Server
Just want to know if the following can be done ENTIRELY in SQL Server 只想知道是否可以在SQL Server中完全执行以下操作
I have a table which has 3 columns - SENTENCE ID (PK)
, SENTENCE
(strings of arbitrary length), PATTERNS
(these are 2 or 3 word patterns which are found in the SENTENCE
). 我有一个包含3列的表
SENTENCE ID (PK)
, SENTENCE
(任意长度的字符串), PATTERNS
(这些是在SENTENCE
中找到的2或3个单词模式)。
I need to find the correlation of all the distinct PATTERNS
with each other. 我需要找到所有不同
PATTERNS
的相关性。
If I do it externally (using python and ODBC) I need to go through the following steps 如果我从外部进行操作(使用python和ODBC),则需要执行以下步骤
FOR each distinct PATTERN 对于每个不同的图案
Next 下一个
Let me assume that PATTERN follows the form of a like
expression. 让我假设PATTERN遵循
like
表达式的形式。 And, that you want to count a pattern for a sentence only once. 而且,您只想为一个句子的模式计数一次。
If so, you can do the following. 如果是这样,您可以执行以下操作。 Get the matches between all sentences and patterns:
获取所有句子和模式之间的匹配项:
with sp as (
select s.sentenceID, p.pattern, count(*) over (partition by p.pattern) as NumSentences
from Sentences s join
Patterns p
on s.sentence like p.pattern
)
select sp1.pattern, sp2.pattern,
sp1.pattern as Pattern1Count, sp2.pattern as Pattern2Count,
count(*) as BothCount
from sp sp1 join
sp sp2
on sp1.pattern < sp2.pattern -- <= if you want counts for a single pattern
group by sp1.pattern, sp2.pattern
You don't explicitly say what kind of output you want, but this should be sufficient. 您没有明确说出想要哪种输出,但这应该足够了。
So, with some reasonable assumptions, you can do this in SQL. 因此,基于一些合理的假设,您可以在SQL中执行此操作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.