简体   繁体   English

我该如何做“相关标签”?

[英]How can I do “related tags”?

I have tags on my website, and I input them one by one when I create a blog post.我的网站上有标签,我在创建博客文章时将它们一一输入。 I love gmail's new feature, that ask you if you want to include X in a mail, if you type Y's name and that you often include both of them in the same messages.我喜欢 gmail 的新功能,它会询问您是否想在邮件中包含 X,是否输入 Y 的名字,并且您经常将它们都包含在同一条消息中。

I'd like to do something similar on my website, but I don't know how to represent the tags "related-ness" in an object or database... thoughts?我想在我的网站上做类似的事情,但我不知道如何在 object 或数据库中表示标签“相关性”......想法?

It all boils down to create associations between certain characteristics of your posts and certain tags, and then - when you press the "publish" button - to analyse the new post and propose all tags matched with your post characteristics.这一切都归结为在您的帖子的某些特征和某些标签之间创建关联,然后 - 当您按下“发布”按钮时 - 分析新帖子并提出与您的帖子特征匹配的所有标签。

This can be done in several ways from a "totally hard-coded" association to some sort of "learning AI"... and everything in-between.这可以通过多种方式完成,从“完全硬编码”的关联到某种“学习 AI”……以及介于两者之间的一切。

Hard-coded solutions硬编码解决方案

This are the simplest algorithms to implement.这是实现的最简单的算法。 You should first decide what characteristics of your post are relevant for tagging (eg: it's length if you tag them "short" or "long", the presence of photos or videos if you tag them "multimedia-content", etc...).您应该首先确定您的帖子的哪些特征与标签相关(例如:如果您将它们标记为“短”或“长”,则它是长度,如果您将它们标记为“多媒体内容”,则存在照片或视频,等等...... )。 The most obvious is however to focus on which words are used in posts.然而,最明显的是关注帖子中使用了哪些词。 For example you could build a mapping like this:例如,您可以构建这样的映射:

tag_hint_words = {'code-development' : ['programming', 
                                        'language', 'python', 'function', 
                                        'object', 'method'],
                  'family' : ['Theresa', 'kids', 
                              'uncle Ben', 'holidays']}

Then you would check your post for the presence of the words in the list (the code between [ and ] ) and propose the tag (the word before : ) as a possible candidate.然后,您将检查您的帖子中是否存在列表中的单词( []之间的代码)并建议标签(之前的单词: )作为可能的候选者。

A common approach is to give "scores", or in other word to put a number that indicates the probability a given tag is the right one.一种常见的方法是给出“分数”,或者换句话说,输入一个数字,表明给定标签是正确标签的概率。 For example: if your post would contain the sentence...例如:如果您的帖子将包含句子...

After months of programming, we finally left for the summer holidays at uncle Ben's cottage.经过几个月的编程,我们终于在本叔叔的小屋里度过了暑假。 Theresa and the kids were ecstatic!特蕾莎和孩子们欣喜若狂!

...despite the presence of the word "programming" the program should indicate family as the most likely tag to use, as there are many more words hinting. ...尽管存在“编程”一词,但程序应将家庭指示为最有可能使用的标签,因为有更多的词暗示。

Learning AI's学习人工智能

One of the obvious limitations of the above method is that - say one day you pick up java beside python - you would probably need to change your code and include words like "java" or "oracle" too.上述方法的一个明显限制是 - 比如说有一天你拿起 java 旁边的 python - 你可能需要更改你的代码并包括像“java”或“oracle”这样的词。 The same applies if you create new tags.如果您创建新标签,这同样适用。

To circumvent this limitation (and have some fun..) you could try to implement a learning algorithm.为了规避这个限制(并获得一些乐趣..),您可以尝试实现学习算法。 Learning algorithms are those who refine their outcome the more you use them (so they indeed... learn.).学习算法是那些在你使用它们的次数越多的情况下改进其结果的算法(所以它们确实......学习。)。 Some algorithm requires initial training (many spam filters and voice recognition programs need this initial "primer").一些算法需要初始训练(许多垃圾邮件过滤器和语音识别程序需要这个初始“入门”)。 Some don't.有些没有。

I am absolutely no expert on the subject, but two common AI's are: the Naive Bayes Classifier and some flavour of Neural network .我绝对不是这方面的专家,但两种常见的 AI 是:朴素贝叶斯分类器和一些神经网络

Although the WP pages might look scary, they are surprisingly easy to implement (at least in Python).尽管 WP 页面可能看起来很吓人,但它们非常容易实现(至少在 Python 中)。 Here's the recording of a lecture at PyCon 2009 on the subject " Easy AI with Python ".这是 PyCon 2009 上关于“ Easy AI with Python ”主题的演讲录音。 I found it very informative and even somehow inspiring: :)我发现它非常有用,甚至在某种程度上鼓舞人心::)

HTH!

Look up Clustering (Machine Learning algorithm).查找聚类(机器学习算法)。 Don't be intimidated by math, it's a pretty straightforward algorithm.不要被数学吓倒,这是一个非常简单的算法。 Check out Machine Learning for Hackers for simpler explanations of many Machine Learning algorithms and methods.查看Machine Learning for Hackers以获得对许多机器学习算法和方法的更简单解释。

You should have a look at this post: Any suggestions for a db schema for storing related keywords?你应该看看这篇文章: 任何关于存储相关关键字的数据库模式的建议?

If you're looking for a schema for storing related tags it will help.如果您正在寻找用于存储相关标签的架构,它将有所帮助。

Relevancy searches where multiple agents play a part are usually done using Collaborative filtering .多个代理参与的相关性搜索通常使用协作过滤来完成。 You might want to give that a look see.你可能想看看。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我如何找到相关的讲座? - How do I find related lectures? 如何将这些相关索引组织成可以在Rust中有效查找的内容? - How do I organize these related indices into something that can be looked up efficiently in Rust? 如何实现Stackoverflow.com中使用的相关标签功能? - How do you implement Related tags functionality as used in Stackoverflow.com? 如何修复错误嵌套/未关闭的HTML标记? - How do I fix wrongly nested / unclosed HTML tags? 如何做相关问题自动填充 - How to do related questions autopopulate 相关标签算法 - Related tags algorithm 如何隔离包含在标记标签中的子字符串? - How can I isolate substrings wrapped in markup tags? 如何查找具有最常见标记的记录,例如StackOverflow中的相关问题 - How to find the records with most common tags, like the related questions in StackOverflow 与算法相关的愚蠢问题:如何在考虑时间的情况下计算命中/未命中缓存比率 - Dumb algorithm-related question: how do I calculate a hit/miss cache ratio taking time into account 如何优化我的代码以交换给定范围的索引的数组元素与相关元素? - How can I optimize my code for Swapping the array elements of given range of indexes with related element?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM