[英]SQL query for large table size
需要幫助在SQL數據庫中查找相似的值。 表結構如下:
id | item_id_nm | height | width | length | weight
----------------------------------------------------------------------------------
1 | 00000000001 | 1.0 | 1.0 | 1.0 | 1.0
2 | 00000000001 | 1.1 | 1.0 | 0.9 | 1.1
3 | 00000000001 | 2.0 | 1.0 | 1.0 | 1.0
4 | 00000000002 | 1.0 | 1.0 | 1.0 | 1.0
5 | 00000000002 | 1.0 | 1.1 | 1.1 | 1.0
6 | 00000000002 | 1.0 | 1.0 | 1.0 | 2.0
id顯然不能重復,item_id_nm可以重復(實際上可以多次發生,也就是> 2)。
您將如何形成SQL以查找重復的item_id_nm,但僅當高度或寬度,長度或重量的值相差> 30%時才查找。
我知道它需要遍歷表,但是我該如何進行檢查。 謝謝您的幫助。
編輯:包含%30差異的示例。 id = 3,其高度與id的1和2的1.0(或1.1)相差200%。因此,抱歉,不清楚,但是對於高度,寬度,長度或重量的每個值,可能會有30%的差異。如果其中一個有30%的差異,則將其視為另一個的重復。
這應該使您的行與平均值相差30%或更多:
SELECT t1.*
FROM tbl t1
INNER JOIN (
SELECT
item_id_nm,
AVG(width) awidth, AVG(height) aheight,
AVG(length) alength, AVG(weight) aweight
FROM tbl
GROUP BY item_id_nm ) t2
USING (item_id_nm)
WHERE
width > awidth * 1.3 OR width < awidth * 0.7
OR height > aheight * 1.3 OR height < aheight * 0.7
OR length > alength * 1.3 OR length < alength * 0.7
OR weight > aweight * 1.3 OR weight < aweight * 0.7
這應該給您幾行相差30%的行:
SELECT t1.*,t2.*
FROM tbl t1
INNER JOIN tbl t2
USING (item_id_nm)
WHERE
(t1.width > t2.with * 1.3 OR t1.width < t2.width * 0.7)
OR (t1.height > t2.height * 1.3 OR t1.height < t2.height * 0.7)
OR (t1.length > t2.length * 1.3 OR t1.length < t2.length * 0.7)
OR (t1.weight > t2.weight * 1.3 OR t1.weight < t2.weight * 0.7)
我認為您可以使用以下方式:
SELECT item_id_nm
FROM yourtable
GROUP BY item_id_nm
HAVING
MIN(height)*1.3 < MAX(height) OR
MIN(width)*1.3 < MAX(width) OR
MIN(length)*1.3 < MAX(length) OR
MIN(weight)*1.3 < MAX(weight)
SELECT
*
FROM
TableName
WHERE
(height > 1.3 * width OR height < 0.7 width) OR
(length > 1.3 * width OR length < 0.7 width)
GROUP BY
item_id_nm
HAVING
COUNT(item_id_nm) > 1
我會用:
SELECT s1.id AS id1, s2.id AS id2
, s1.height AS h1, s2.height as h2
, s1.width as width1, s2.width as width2
, s1.length as l1, s2.length as l2
, s1.weight as weight1, s2.weight as weight2
FROM stack s1
INNER JOIN stack s2
ON s1.item_id_nm = s2.item_id_nm
WHERE s1.id != s2.id
AND s1.id < s2.id
AND (abs(100-((s2.height*100)/s1.height)) > 30
OR abs(100-((s2.width*100)/s1.width)) > 30
OR abs(100-((s2.length*100)/s1.length)) > 30
OR abs(100-((s2.weight*100)/s1.weight)) > 30)
使用PostgreSQL( http://sqlfiddle.com/#!12/e5f25/15 )。 此代碼不返回重復的行。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.