简体   繁体   English

选择MySQL结果,忽略计数大于X的字段

[英]Select MySQL results, ignoring fields that have a count greater than X

I have the following MySQL table setup that logs site visits: 我有以下MySQL表设置记录网站访问:

id   timestamp   ip   tracking
  • id = auto generated id =自动生成
  • timestamp = standard datetime timestamp =标准日期时间
  • ip = users ip address ip =用户IP地址
  • tracking = a tracking code that is passed via the URL tracking =通过URL传递的跟踪代码

The purpose of this tracking is various forwarded domains are sent to this one site with this script running. 此跟踪的目的是在运行此脚本的情况下将各种转发域发送到此站点。 The script logs the ip and timestamp, and grabs the tracking code from the URL. 该脚本记录ip和时间戳,并从URL中获取跟踪代码。

What we are tying to do is have a rollup count of all tracking codes used, however, there are a LOT of spam requests (bots) hitting the site. 我们要做的是对所使用的所有跟踪代码进行汇总计数,但是,有很多垃圾邮件请求(机器人)访问该网站。 I'm trying to figure out the best way to filter out results that I think are bots, I don't want them counted in the final results. 我试图找出滤出我认为是机器人的结果的最佳方法,我不希望它们计入最终结果中。

My initial thought was to first filter out all ips that occur more than once. 我最初的想法是首先过滤掉不止一次出现的所有ips。 My problem is, how do I then use the results of that query to go back and count the tracking codes? 我的问题是,如何使用该查询的结果返回并计算跟踪代码?

My code to filter ips that only show once is: 我过滤仅显示一次的ips的代码是:

SELECT tracking, ip, COUNT( * ) 
FROM tracking
GROUP BY ip
HAVING COUNT( * ) =1
ORDER BY COUNT( * ) DESC

How do I then take those results and run another query to then count and sum up the tracking codes? 然后,我如何获取这些结果并运行另一个查询,然后计算并总结跟踪代码?

-Kevin -Kevin

EDIT: 编辑:

Sorry, first post here and I rushed a little. 对不起,首先发布在这里,我冲了一下。 In the end, what I'm looking for is to get the count of all the tracking codes used. 最后,我正在寻找的是获取所有使用的跟踪代码的计数。

Let's assume I have the following table data: 我们假设我有以下表格数据:

id       timestamp               ip             tracking
--       ---------               --             --------
1        2014-01-10 23:43:10     192.168.1.1    100
2        2014-01-10 23:43:10     192.168.1.1    200
3        2014-01-10 23:43:10     192.168.1.2    100
4        2014-01-10 23:43:10     192.168.1.1    999
5        2014-01-10 23:43:10     192.168.1.1    100
6        2014-01-12 23:43:10     192.168.1.1    100
7        2014-01-12 23:43:10     192.168.1.3    100
8        2014-01-12 23:43:10     192.168.1.4    100
9        2014-01-12 23:43:10     192.168.1.5    600
10       2014-01-12 23:43:10     192.168.1.1    888
11       2014-01-12 23:43:10     192.168.1.1    888
12       2014-01-12 23:43:10     192.168.1.8    200
13       2014-01-12 23:43:10     192.168.1.9    300
14       2014-01-12 23:43:10     192.168.1.10   100
15       2014-01-12 23:43:10     192.168.1.11   400
16       2014-01-12 23:43:10     192.168.1.1    888
17       2014-01-12 23:43:10     192.168.1.12   200
18       2014-01-12 23:43:10     192.168.1.2    777
19       2014-01-12 23:43:10     192.168.1.2    100
20       2014-01-12 23:43:10     192.168.1.1    200
21       2014-01-12 23:43:10     192.168.1.4    789

In the end I want to display a count of all tracking codes used, but to ignore any rows where an ip address looks to be from a bot. 最后,我想显示所有使用的跟踪代码的计数,但要忽略任何行地址,其中ip地址看起来来自机器人。 Because of the nature of this setup, we kind of assume that ip addresses would only hit the site once, maybe twice. 由于这种设置的性质,我们假设IP地址只会打到一次,也许两次。 So I figure, if I can get the count of tracking codes, excluding any row where the ip address is greater than 1 (or maybe 2). 所以我想,如果我能得到跟踪代码的数量,排除ip地址大于1(或者可能是2)的任何行。

So the final result from that data set would be 所以该数据集的最终结果将是

tracking  count
--------  -----
100         3
200         2
300         1
400         1
600         1
789         1

Basically from the results we are not counting anything from 192.168.1.1 and 192.168.1.2 because those ips visited more than 1 time. 基本上从结果我们不计算192.168.1.1和192.168.1.2中的任何内容,因为那些ips访问了超过1次。

EDIT - I added a row #21 to have one of the IPs visit twice, therefore both of their visits should count if we are using <3 in the query. 编辑 - 我添加了一行#21让其中一个IP访问两次,因此如果我们在查询中使用<3,则他们的访问都应该计算。 It looks like the below answer isn't working correctly. 看起来以下答案无法正常工作。 When I add the row #21, the code 789 doesn't get counted 当我添加第21行时,代码789不会被计算

Hope this helps understand it better? 希望这有助于更好地理解它?

I know how to get the overall count of either ips or tracking, but I can't figure out how to put the two together in one query. 我知道如何获得ips或跟踪的整体计数,但我无法弄清楚如何将两者放在一个查询中。

-Kevin -Kevin

EDIT 2/4/14 - So I what I think is happening is the query below is only counting the tracking code of the first instance of the IP. 编辑2/4/14 - 所以我认为正在发生的是下面的查询仅计算IP的第一个实例的跟踪代码。 So in the case let's change the table to have a better set of data 因此,在这种情况下,让我们更改表格以获得更好的数据集

id       timestamp               ip             tracking
--       ---------               --             --------
1        2014-01-10 23:43:10     192.168.1.1    100
2        2014-01-10 23:43:10     192.168.1.222  100
3        2014-01-10 23:43:10     192.168.1.1    200
4        2014-01-10 23:43:10     192.168.1.2    100
5        2014-01-10 23:43:10     192.168.1.1    999
6        2014-01-12 23:43:10     192.168.1.1    100
7        2014-01-12 23:43:10     192.168.1.2    100
8        2014-01-12 23:43:10     192.168.1.3    100
9        2014-01-12 23:43:10     192.168.1.4    100
10       2014-01-12 23:43:10     192.168.1.5    600
11       2014-01-12 23:43:10     192.168.1.1    888
12       2014-01-12 23:43:10     192.168.1.1    888
13       2014-01-12 23:43:10     192.168.1.8    200
14       2014-01-12 23:43:10     192.168.1.9    300
15       2014-01-12 23:43:10     192.168.1.10   100
16       2014-01-12 23:43:10     192.168.1.11   400
17       2014-01-12 23:43:10     192.168.1.1    888
18       2014-01-12 23:43:10     192.168.1.12   200
19       2014-01-12 23:43:10     192.168.1.222  777
20       2014-01-12 23:43:10     192.168.1.2    100
21       2014-01-12 23:43:10     192.168.1.1    200
22       2014-01-12 23:43:10     192.168.1.4    789

In this case, I would want the query to be where any IP appears 2 or less times. 在这种情况下,我希望查询是任何IP出现2次或更少次的地方。 So the results SHOULD be: 所以结果应该是:

tracking  count
--------  -----
100         4
200         2
300         1
400         1
600         1
777         1
789         1

Basically, 192.168.1.1 and .2 are the only ones that appear more than 2 times, so they should be excluded. 基本上,192.168.1.1和.2是唯一出现超过2次的,因此应排除它们。 Some IPs, like .4 and .222 appear twice, which is fine, but each time they use a different code. 一些IP,如.4和.222出现两次,这很好,但每次使用不同的代码。

Using the query below: 使用以下查询:

select xyz.tracking,count(xyz.tracking) as `count` from (select ip,count(ip),tracking from tracking group by ip having count(ip)<3) xyz group by xyz.tracking;

I seems to only pick up the code for the first instance of each IP. 我似乎只拿起每个IP的第一个实例的代码。 So the results I get are: 所以我得到的结果是:

tracking  count
--------  -----
100         4
200         2
300         1
400         1
600         1

So in this case it's picking up the code 100 for IP .222 but not the code 777 for IP .222 It's picking up code 100 for IP .4 but not the code 789 for IP .4. 因此,在这种情况下,它将获取IP .222的代码100,而不是IP的代码777 .222它为IP .4拾取代码100,而不是IP .4的代码789。

Anyone have any ideas how to resolve this? 任何人有任何想法如何解决这个问题?

EDIT: So I think I have a solution. 编辑:所以我想我有一个解决方案。 It's returning the correct values. 它返回正确的值。 Can someone verify? 有人可以验证吗?

SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t 
JOIN (
    SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d 
ON d.ip = t.ip
GROUP BY t.tracking

I believe I found the answer in case anyone else needs a query like this. 我相信我找到了答案,以防其他人需要这样的查询。

SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t 
JOIN (
    SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d 
ON d.ip = t.ip
GROUP BY t.tracking

试试这个:

select xyz.tracking,count(xyz.tracking) as `count` from (select ip,count(ip),tracking from tracking group by ip having count(ip)<3) xyz group by xyz.tracking;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM