This is my sample data set...
CREATE TABLE blockhashtable (id int PRIMARY KEY AUTO_INCREMENT,pos int,filehash varchar(35), blockhash varchar(130) );
insert into blockhashtable
(pos,filehash,blockhash) values
(1, "random_md51", "randstr1"),
(2, "random_md51", "randstr2"),
(3, "random_md51", "randstr3"),
(1, "random_md52", "randstr2"),
(2, "random_md52", "randstr2"),
(3, "random_md52", "randstr2"),
(4, "random_md52", "randstr1"),
(5, "random_md52", "randstr7"),
(1, "random_md53", "randstr2"),
(2, "random_md53", "randstr1"),
(3, "random_md53", "randstr2"),
(4, "random_md53", "randstr1"),
(1, "random_md54", "randstr1"),
(2, "random_md54", "randstr55");
Current SQL Query (Need to be fixed):
SELECT filehash
, GROUP_CONCAT(pos ORDER BY pos) pos
, (avg(blockhash IN('randstr1','randstr2','randstr3','randstr2','randstr2'))) as ratio
FROM blockhashtable
GROUP
BY filehash
Current output (Need to be fixed)
filehash pos ratio
random_md51 1,2,3 1
random_md52 1,2,3,4,5 0.8
random_md53 1,2,3,4 1
random_md54 1,2 0.5
SQL Fiddle: http://sqlfiddle.com/#!9/6b5220/10
Expected output:
filehash pos ratio
random_md51 1,2,3 1
random_md52 1,2,3,4 0.8
random_md53 1,2,3 0.75
random_md54 1 0.5
I am basically trying to find "similar blockhash" between the query list & sql table.
About ratio columns:
If randomstr1
appear only once in the SQL query, then I want maximum 1 match for randomstr1
in the SQL db.
In the third output row. ratio
is 0.75 because randomstr1
appear only one time in query, even if it appear twice in MySQL table. So in third row, we found 3/4 match. randomstr2
is matched both times in third row because it appears 2 or more times in SQL query.
About the pos
. I just want to know the pos
value of the matched blocks
.
With ROW_NUMBER()
window function you can check if 'randomstr1' exists more than or 'randomstr2' exists more than 3 times so you can ignore them:
with
row_numbers as (
select *,
row_number() over (partition by filehash, blockhash order by pos) rn
from blockhashtable
),
cte as (
select *,
(blockhash = 'randstr1' and rn = 1)
or
(blockhash = 'randstr2' and rn <= 3)
or
(blockhash = 'randstr3') valid
from row_numbers
)
select filehash,
group_concat(case when valid then pos end order by pos) pos,
avg(valid) as ratio
from cte
group by filehash
See the demo .
Results:
> filehash | pos | ratio
> :---------- | :------ | -----:
> random_md51 | 1,2,3 | 1.00
> random_md52 | 1,2,3,4 | 0.80
> random_md53 | 1,2,3 | 0.75
> random_md54 | 1 | 0.50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.