简体   繁体   English

mysql 表的 SQL 公式

[英]SQL Formula for mysql Table

Hello – I have a DB table (MySQL ver 5.6.41-84.1-log) that has about 92,000 entries, with columns for:您好 – 我有一个数据库表(MySQL 版本 5.6.41-84.1-log),其中包含大约 92,000 个条目,其中包含以下列:

  • id (incremental unique ID) id (增量唯一 ID)
  • post_type (not important) post_type(不重要)
  • post_id (not important, but shows relation to another table) post_id(不重要,但显示与另一个表的关系)
  • user_id (not important) user_id(不重要)
  • vote (not important)投票(不重要)
  • ip (IP Address, ie. 123.123.123.123) ip (IP 地址,即 123.123.123.123)
  • voted (Datestamp in GMT, ie. 2018-12-03 04:50:05)投票(格林威治标准时间的日期戳,即 2018-12-03 04:50:05)

I recently ran a contest and we had a rule that no single IP could vote more than 60 times per day.我最近举办了一场比赛,我们有一个规则,即没有一个 IP 每天可以投票超过 60 次。 So now I need to run a custom SQL formula that applies the following rule:所以现在我需要运行一个应用以下规则的自定义 SQL 公式:

For each IP address, for each day, if there are > 60 rows, delete those additional rows.对于每个 IP 地址,每天,如果有 > 60 行,则删除这些额外的行。

Thank you for your help!感谢您的帮助!

This is a complicated one, and I think it is hard to provide a 100% sure answer without actual table and data to play with.这是一个复杂的问题,我认为在没有实际表格和数据的情况下很难提供 100% 确定的答案。

However let me try to describe the logic, and build the query step by step so you can paly around with it and possibly fix lurking erros.但是,让我尝试描述逻辑,并逐步构建查询,以便您可以使用它并可能修复潜伏的错误。

1) We start with selecting all ip adresses that posted more than 60 votes on a given day. 1) 我们首先选择在某一天发布超过 60 票的所有 ip 地址。 For this we use a group by on the voting day and on the ip adress, combined with a having clause为此,我们在投票日和 IP 地址上使用了一个group by ,并结合了一个having子句

select date(voted), ip_adress
from table 
group by date(voted), ip_adress 
having count(*) > 60

2) From then, we go back to the table and select the first 60 ids corresponding to each voting day / ip adress couple. 2) 从那时起,我们回到表中,选择每个投票日/ip 地址对对应的前 60 个 id。 id is an autoincremented field so we just sort using this field and the use the mysql limit instruction id是一个自动递增的字段,所以我们只使用这个字段进行排序,并使用 mysql limit指令

    select id, ip_adress, date(voted) as day_voted
    from table 
    where ip_adress, date(voted) in (
        select date(voted), ip_adress 
        from table 
        group by date(voted), ip_adress 
        having count(*) > 60
    ) 
    order by id
    limit 60

3) Finally, we go back once again to the table and search for the all ids whose ip adress and day of vote belong to the above list, but whose id is greater than the max id of the list. 3) 最后,我们再次返回到表中,搜索所有 ip 地址和投票日期属于上述列表但其 id 大于列表最大 id 的 id。 This is achieved with a join and requires a group by clause.这是通过join实现的,并且需要一个group by子句。

select t1.id 
from 
    table t1
    join (      
        select id, ip_adress, date(voted) as day_voted 
        from table 
        where ip_adress, date(voted) in (
            select date(voted), ip_adress 
            from table
            group by date(voted), ip_adress
            having count(*) > 60
        )
        order by id
        limit 60
    ) t2 
        on t1.ip_adress = t2.ip_adress 
        and date(t1.voted) = t2.day_voted and t1.id > max(t2.id)
group by t1.id

That should return the list of all ids that we need to delete.这应该返回我们需要删除的所有 id 的列表。 Test if before you go further.在你走得更远之前测试一下。

4) The very last step is to delete those ids. 4)最后一步是删除这些ID。 There are limitations in mysql that make a delete with subquery condition quite uneasy to achieve. mysql 中存在一些限制,使得带有子查询条件的delete非常难以实现。 See the following SO question for more information on the technical background.有关技术背景的更多信息,请参阅以下 SO 问题 You can either use a temporary table to store the selected ids, or try to outsmart mysql by wrapping the subquery and aliasing it.您可以使用临时表来存储选定的 id,或者尝试通过包装子查询和别名来超越 mysql。 Let us try with the second option :让我们尝试第二个选项:

delete t.* from table t where id in ( select id from (
    select t1.id 
    from 
        table t1
        join (      
            select id, ip_adress, date(voted) as day_voted 
            from table 
            where ip_adress, date(voted) in (
                select date(voted), ip_adress
                from table 
                group by date(voted), ip_adress
                having count(*) > 60
            )
            order by id
            limit 60
        ) t2 
            on t1.ip_adress = t2.ip_adress
            and date(t1.voted) = t2.day_voted
            and t1.id > max(t2.id)
    group by t1.id
) x );

Hope this helps !希望这可以帮助 !

You could approach this by vastly simplifying your sample data and using row number simulation for mysql version prior to 8.0 or window function for versions 8.0 or above.您可以通过极大地简化示例数据并使用 8.0 之前的 mysql 版本的行号模拟或 8.0 或更高版本的窗口函数来解决此问题。 I assume you are not on version 8 or above in the following example在以下示例中,我假设您使用的不是版本 8 或更高版本

drop table if exists t;
create table t(id int auto_increment primary key,ip varchar(2));
insert into t (ip) values
(1),(1),(3),(3),
(2),
(3),(3),(1),(2);

delete t1 from t t1 join
(
select id,rownumber from
(
select t.*,
         if(ip <> @p,@r:=1,@r:=@r+1) rownumber,
         @p:=ip p
from t
cross join (select @r:=0,@p:=0) r
order by ip,id
)s
where rownumber > 2
) a on a.id = t1.id;

Working in to out the sub query s allocates a row number per ip, sub query a then picks row numbers > 2 and the outer multi-table delete deletes from t joined to a to give子查询 s 为每个 ip 分配一个行号,子查询 a 然后选择行号 > 2 并且外部多表删除从 t 连接到 a 以给出

+----+------+
| id | ip   |
+----+------+
|  1 | 1    |
|  2 | 1    |
|  3 | 3    |
|  4 | 3    |
|  5 | 2    |
|  9 | 2    |
+----+------+
6 rows in set (0.00 sec)

I had someone help me write the following query, which addressed my question.我有人帮我写了以下查询,它解决了我的问题。

SET SQL_SAFE_UPDATES = 0;
create table temp( SELECT id, ip, voted
    FROM
        (SELECT id, ip, voted,
            @ip_rank := IF(@current_ip = ip, @ip_rank + 1, 1) AS ip_rank,
            @current_ip := ip
        FROM `table_name` where ip in (SELECT ip from `table_name` group by date(voted),ip having count(*) >60)
        ORDER BY ip, voted desc
        ) ranked
    WHERE ip_rank <= 2);
DELETE FROM `table_name`
WHERE id not in (select id from temp) and ip in (select ip from temp);
drop table temp;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM