简体   繁体   English

如何使用NLP和机器学习检测事件?

[英]How to detect the events using NLP and Machine learning?

I have text describing about events such as birth , new job , wedding , death etc .. or no event . 我的文字描述了诸如出生,新工作,婚礼,死亡等事件或没有事件。 How do i detect these events ? 如何检测这些事件?

My approach is to form set of words and search them in text corresponding to event . 我的方法是形成一组单词,并在对应于event的文本中搜索它们。 Or use bayesian classifier . 或使用贝叶斯分类器。 But bayesian classifier requires some training on all classes , I require method which can even classify without giving it all types of examples ? 但是贝叶斯分类器需要对所有类进行一些训练,我需要甚至可以在不提供所有类型示例的情况下进行分类的方法? Is that possible ? 那可能吗 ?

What are your performance requirements? 您对性能有何要求? Is low recall ok? 召回率低可以吗? Do you need high precision? 您需要高精度吗?

Based on your question, I'm guessing you want something with reasonable recall (reading classified ads to send out spam?) but don't actually have any training data. 根据您的问题,我想您想要的东西可以带来合理的回忆(阅读分类广告以发送垃圾邮件?),但实际上没有任何培训数据。

You want a method that can classify documents (using events from a Named Entity Recognition algorithm as features) without providing any training data. 您需要一种无需提供任何训练数据即可将文档分类(使用命名实体识别算法的事件作为特征)的方法。 All supervised methods (including Bayesian ones) require training data, so what you are asking/wishing for is not possible. 所有受监督的方法(包括贝叶斯方法)都需要训练数据,因此您要/希望得到的内容是不可能的。 You need labelled data in any case, otherwise how can you tell how well your detection process is working? 无论如何,您都需要标记的数据,否则如何判断检测过程的运行状况呢?

At this stage you should not even be worried about which classifier to use - I suggest writing a handful of regular expressions to see how hard your problem is / what performance you get. 在此阶段,您甚至不必担心要使用哪个分类器-我建议编写一些正则表达式来查看问题的难度/获得的性能。 It may be a dozen regular expressions may get you 90% of these events and you can avoid over-engineering the problem. 可能会有十几个正则表达式可以使您获得90%的这些事件,并且可以避免过度设计问题。 Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM