简体   繁体   English

用户识别算法

[英]User recognition algorithm

let's say you have a big IRC chan log, and you want to find out what user is using multiple accounts. 假设您有一个大的IRC chan日志,并且您想知道哪些用户正在使用多个帐户。 As input you have the time the user connects to the server, and some sort of text analysis ( word frequency, and so on), and as output you want the likelihood two user "matches". 作为输入,您有时间用户连接到服务器,以及某种文本分析(单词频率等),作为输出,您希望两个用户“匹配”的可能性。

Is it possible to do it using ANN? 是否可以使用ANN进行? Are there better algorithms to accomplish that task? 有没有更好的算法来完成这项任务?

PS : use IP addresses is not an accepted solution :) PS:使用IP地址不是公认的解决方案:)

This problem is known as "authorship detection" (or sometimes, in a particular domain, "plagiarism detection"). 该问题被称为“作者身份检测”(或者有时,在特定领域,“抄袭检测”)。 It can be done using a variety of statistical algorithms, of which neural networks aren't the easiest. 它可以使用各种统计算法来完成,其中神经网络并不是最简单的算法。

Check out the Cavnar & Trenkle algorithm for text classification. 查看Cavnar&Trenkle算法进行文本分类。 That may be made into a useful baseline algorithm for this task. 这可以成为此任务的有用基线算法。 Implementations in various languages are available on the web. 可在网上获得各种语言的实现。 You may want to turn it into a clustering algorithm instead of a classifier. 您可能希望将其转换为聚类算法而不是分类器。

The problem with using neural networks is that you need a robust set of training data--that is, you need to have lots of examples of people using multiple accounts where you already know that's what they're doing. 使用神经网络的问题在于,您需要一组强大的训练数据 - 也就是说,您需要有很多人使用多个帐户的示例,您已经知道他们正在做的事情。 Furthermore, if the people you're trying to identify have ever played a role-playing game, they'll probably be able to make themselves seem quite a bit different if they want to. 此外,如果你想识别的人曾经玩过角色扮演游戏,那么如果他们愿意,他们可能会让自己看起来有点不同。

So, if people are acting just like themselves and you have a pretty good training data set, then you stand a chance. 所以,如果人们表现得和自己一样, 并且你有很好的训练数据集,那么你就有机会。 You should probably start with methods used by forensic linguistics . 您应该从法医语言学使用的方法开始。

But I suspect that what you'll probably end up doing is identifying people who are sort of similar to each other. 但我怀疑你最终可能会做的就是找出彼此相似的人。 Good for a matchmaking site, perhaps; 也许适合搭配场地; not so cool for most other things. 对大多数其他事情来说并不那么酷。 (For example, I would think this would be a perfectly dreadful way to try to find members of Anonymous in other guises.) (例如,我认为这将是一个非常可怕的方式,试图找到其他伪装的匿名成员。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM