简体   繁体   中英

Find common values between SQL table & list in SQL Query like this

This is my sample data set...

CREATE TABLE blockhashtable (id int PRIMARY KEY AUTO_INCREMENT,pos int,filehash varchar(35), blockhash varchar(130) );

insert into blockhashtable 
(pos,filehash,blockhash) values 
(1, "random_md51", "randstr1"),
(2, "random_md51", "randstr2"),
(3, "random_md51", "randstr3"),
(1, "random_md52", "randstr2"),
(2, "random_md52", "randstr2"),
(3, "random_md52", "randstr2"),
(4, "random_md52", "randstr1"),
(5, "random_md52", "randstr7"),
(1, "random_md53", "randstr2"),
(2, "random_md53", "randstr1"),
(3, "random_md53", "randstr2"),
(4, "random_md53", "randstr1"),
(1, "random_md54", "randstr1"),
(2, "random_md54", "randstr55");

Current SQL Query (Need to be fixed):

SELECT filehash
     , GROUP_CONCAT(pos ORDER BY pos) pos
     , (avg(blockhash IN('randstr1','randstr2','randstr3','randstr2','randstr2'))) as ratio
  FROM blockhashtable
 GROUP
    BY filehash

Current output (Need to be fixed)

filehash    pos        ratio
random_md51 1,2,3      1
random_md52 1,2,3,4,5  0.8
random_md53 1,2,3,4    1
random_md54 1,2        0.5

SQL Fiddle: http://sqlfiddle.com/#!9/6b5220/10

Expected output:

filehash    pos        ratio
random_md51 1,2,3      1
random_md52 1,2,3,4    0.8
random_md53 1,2,3      0.75
random_md54 1          0.5

I am basically trying to find "similar blockhash" between the query list & sql table.

About ratio columns:

If randomstr1 appear only once in the SQL query, then I want maximum 1 match for randomstr1 in the SQL db.

In the third output row. ratio is 0.75 because randomstr1 appear only one time in query, even if it appear twice in MySQL table. So in third row, we found 3/4 match. randomstr2 is matched both times in third row because it appears 2 or more times in SQL query.

About the pos . I just want to know the pos value of the matched blocks .

With ROW_NUMBER() window function you can check if 'randomstr1' exists more than or 'randomstr2' exists more than 3 times so you can ignore them:

with 
  row_numbers as (
    select *, 
      row_number() over (partition by filehash, blockhash order by pos) rn
    from blockhashtable 
  ),
  cte as (
    select *, 
    (blockhash = 'randstr1' and rn = 1)
    or 
    (blockhash = 'randstr2' and rn <= 3)
    or 
    (blockhash = 'randstr3') valid
    from row_numbers
  )
select filehash,
  group_concat(case when valid then pos end order by pos) pos,
  avg(valid) as ratio
from cte
group by filehash

See the demo .
Results:

> filehash    | pos     |  ratio
> :---------- | :------ | -----:
> random_md51 | 1,2,3   | 1.00
> random_md52 | 1,2,3,4 | 0.80
> random_md53 | 1,2,3   | 0.75
> random_md54 | 1       | 0.50

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM