简体   繁体   中英

Select MySQL results, ignoring fields that have a count greater than X

I have the following MySQL table setup that logs site visits:

id   timestamp   ip   tracking
  • id = auto generated
  • timestamp = standard datetime
  • ip = users ip address
  • tracking = a tracking code that is passed via the URL

The purpose of this tracking is various forwarded domains are sent to this one site with this script running. The script logs the ip and timestamp, and grabs the tracking code from the URL.

What we are tying to do is have a rollup count of all tracking codes used, however, there are a LOT of spam requests (bots) hitting the site. I'm trying to figure out the best way to filter out results that I think are bots, I don't want them counted in the final results.

My initial thought was to first filter out all ips that occur more than once. My problem is, how do I then use the results of that query to go back and count the tracking codes?

My code to filter ips that only show once is:

SELECT tracking, ip, COUNT( * ) 
FROM tracking
GROUP BY ip
HAVING COUNT( * ) =1
ORDER BY COUNT( * ) DESC

How do I then take those results and run another query to then count and sum up the tracking codes?

-Kevin

EDIT:

Sorry, first post here and I rushed a little. In the end, what I'm looking for is to get the count of all the tracking codes used.

Let's assume I have the following table data:

id       timestamp               ip             tracking
--       ---------               --             --------
1        2014-01-10 23:43:10     192.168.1.1    100
2        2014-01-10 23:43:10     192.168.1.1    200
3        2014-01-10 23:43:10     192.168.1.2    100
4        2014-01-10 23:43:10     192.168.1.1    999
5        2014-01-10 23:43:10     192.168.1.1    100
6        2014-01-12 23:43:10     192.168.1.1    100
7        2014-01-12 23:43:10     192.168.1.3    100
8        2014-01-12 23:43:10     192.168.1.4    100
9        2014-01-12 23:43:10     192.168.1.5    600
10       2014-01-12 23:43:10     192.168.1.1    888
11       2014-01-12 23:43:10     192.168.1.1    888
12       2014-01-12 23:43:10     192.168.1.8    200
13       2014-01-12 23:43:10     192.168.1.9    300
14       2014-01-12 23:43:10     192.168.1.10   100
15       2014-01-12 23:43:10     192.168.1.11   400
16       2014-01-12 23:43:10     192.168.1.1    888
17       2014-01-12 23:43:10     192.168.1.12   200
18       2014-01-12 23:43:10     192.168.1.2    777
19       2014-01-12 23:43:10     192.168.1.2    100
20       2014-01-12 23:43:10     192.168.1.1    200
21       2014-01-12 23:43:10     192.168.1.4    789

In the end I want to display a count of all tracking codes used, but to ignore any rows where an ip address looks to be from a bot. Because of the nature of this setup, we kind of assume that ip addresses would only hit the site once, maybe twice. So I figure, if I can get the count of tracking codes, excluding any row where the ip address is greater than 1 (or maybe 2).

So the final result from that data set would be

tracking  count
--------  -----
100         3
200         2
300         1
400         1
600         1
789         1

Basically from the results we are not counting anything from 192.168.1.1 and 192.168.1.2 because those ips visited more than 1 time.

EDIT - I added a row #21 to have one of the IPs visit twice, therefore both of their visits should count if we are using <3 in the query. It looks like the below answer isn't working correctly. When I add the row #21, the code 789 doesn't get counted

Hope this helps understand it better?

I know how to get the overall count of either ips or tracking, but I can't figure out how to put the two together in one query.

-Kevin

EDIT 2/4/14 - So I what I think is happening is the query below is only counting the tracking code of the first instance of the IP. So in the case let's change the table to have a better set of data

id       timestamp               ip             tracking
--       ---------               --             --------
1        2014-01-10 23:43:10     192.168.1.1    100
2        2014-01-10 23:43:10     192.168.1.222  100
3        2014-01-10 23:43:10     192.168.1.1    200
4        2014-01-10 23:43:10     192.168.1.2    100
5        2014-01-10 23:43:10     192.168.1.1    999
6        2014-01-12 23:43:10     192.168.1.1    100
7        2014-01-12 23:43:10     192.168.1.2    100
8        2014-01-12 23:43:10     192.168.1.3    100
9        2014-01-12 23:43:10     192.168.1.4    100
10       2014-01-12 23:43:10     192.168.1.5    600
11       2014-01-12 23:43:10     192.168.1.1    888
12       2014-01-12 23:43:10     192.168.1.1    888
13       2014-01-12 23:43:10     192.168.1.8    200
14       2014-01-12 23:43:10     192.168.1.9    300
15       2014-01-12 23:43:10     192.168.1.10   100
16       2014-01-12 23:43:10     192.168.1.11   400
17       2014-01-12 23:43:10     192.168.1.1    888
18       2014-01-12 23:43:10     192.168.1.12   200
19       2014-01-12 23:43:10     192.168.1.222  777
20       2014-01-12 23:43:10     192.168.1.2    100
21       2014-01-12 23:43:10     192.168.1.1    200
22       2014-01-12 23:43:10     192.168.1.4    789

In this case, I would want the query to be where any IP appears 2 or less times. So the results SHOULD be:

tracking  count
--------  -----
100         4
200         2
300         1
400         1
600         1
777         1
789         1

Basically, 192.168.1.1 and .2 are the only ones that appear more than 2 times, so they should be excluded. Some IPs, like .4 and .222 appear twice, which is fine, but each time they use a different code.

Using the query below:

select xyz.tracking,count(xyz.tracking) as `count` from (select ip,count(ip),tracking from tracking group by ip having count(ip)<3) xyz group by xyz.tracking;

I seems to only pick up the code for the first instance of each IP. So the results I get are:

tracking  count
--------  -----
100         4
200         2
300         1
400         1
600         1

So in this case it's picking up the code 100 for IP .222 but not the code 777 for IP .222 It's picking up code 100 for IP .4 but not the code 789 for IP .4.

Anyone have any ideas how to resolve this?

EDIT: So I think I have a solution. It's returning the correct values. Can someone verify?

SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t 
JOIN (
    SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d 
ON d.ip = t.ip
GROUP BY t.tracking

I believe I found the answer in case anyone else needs a query like this.

SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t 
JOIN (
    SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d 
ON d.ip = t.ip
GROUP BY t.tracking

试试这个:

select xyz.tracking,count(xyz.tracking) as `count` from (select ip,count(ip),tracking from tracking group by ip having count(ip)<3) xyz group by xyz.tracking;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM