简体繁体 English

如何在不跟踪机器人的情况下跟踪外发链接点击？

[英]How can I track outgoing link clicks without tracking bots?

原文 2012-03-25 06:34:00 9 2 php/ javascript/ web-analytics

I have a few thoughts on this but I can see problems with both. 我对此有一些想法，但我可以看到两者都有问题。 I don't need 100% accurate data. 我不需要100％准确的数据。 An 80% solution that allows me to make generalizations about the most popular domains I'm routing users to is fine. 一个80％的解决方案，允许我对我正在路由用户的最流行的域进行概括，这很好。

Option 1 - Use PHP. 选项1 - 使用PHP。 Route links through a file track.php that makes sure the referring page is from my domain before tracking the click. 通过文件track.php路由链接，以确保引用页面来自我的域，然后再跟踪点击。 This page then routes the user to the final intended URL. 然后，该页面将用户路由到最终的预期URL。 Obviously bots could spoof this. 显然机器人可以欺骗这个。 Do many? 做多少？ I could also check the user agent. 我也可以检查用户代理。 Again, I KNOW many bots spoof this. 再说一次，我知道很多机器人都会欺骗这个。

Option 2 - Use JavaScript. 选项2 - 使用JavaScript。 Execute a JavaScript on click function that writes the click to the database and then directs the user to the final URL. 在单击函数上执行JavaScript，将单击写入数据库，然后将用户定向到最终URL。

Both of these methods feel like they may cause problems with crawlers following my outgoing links. 这两种方法都觉得它们可能会导致我的传出链接出现问题。 What is the most effective method for tracking these outgoing clicks? 跟踪这些外发点击的最有效方法是什么？

2 个解决方案

The most effective method for tracking outgoing links (it's used by Facebook, Twitter, and almost every search engine) is a " track.php " type file. 跟踪外发链接（Facebook，Twitter和几乎所有搜索引擎使用）的最有效方法是“ track.php ”类型文件。

Detecting bots can be considered a separate problem, and the methods are covered fairly well by these questions: http://duckduckgo.com/?q=how+to+detect+http+bots+site%3Astackoverflow.com But doing a simple string search for "bot" in the User-Agent will probably get you close to your 80%* (and watching for hits to /robots.txt will, depending on the type of bot you're dealing with, get you 95%*). 检测机器人可能被认为是一个单独的问题，这些问题很好地涵盖了这些方法： http ： //duckduckgo.com/？q = how + to + detect+http+ bots+site% 3Astackoverflow.com但是做一个简单的事情在User-Agent中搜索“bot”的字符串可能会让你接近你的80％*（并且根据你正在处理的机器人的类型，观察到/robots.txt命中率，得到95％* ）。

*: a semi-educated guess, based on zero concrete data *：基于零具体数据的半教育猜测

Well, Google analytics and Piwik use Javascript for that. 好吧，谷歌分析和Piwik使用Javascript。

Since bots can't use JS, you'll only have humans. 由于机器人不能使用JS，你只能拥有人类。 In the other way, humans can disable JS too (but sincerely, that's rarely the case) 另一方面，人类也可以禁用JS（但真诚地，这种情况很少发生）

Facebook, Deviantart, WLM, etc use server side script to track. Facebook，Deviantart，WLM等使用服务器端脚本来跟踪。 I don't know how they filter bots but a nice robots.txt with one or two filter and that should be good enough to get 80% I guess. 我不知道他们如何过滤机器人，但是一个漂亮的robots.txt有一个或两个过滤器，这应该足够好，我猜是80％。