[英]Naive bayes text classification calculation, better to do in MySQL or java
The calculation for class conditional probability in naive bayes is 朴素贝叶斯中的类条件概率计算为
P(t|c) = Log2((n1+1)/(n2+n3))
Where 哪里
Which one is faster, doing calculation in MySQL or in Java (of course we need to grab data from MySQL to use it in Java)? 使用MySQL或Java(当然,我们需要从MySQL抓取数据才能在Java中使用它)进行计算,哪个更快?
The Naive Bayes classifier is computationally simple, but it requires lots of data manipulations. 朴素贝叶斯分类器在计算上很简单,但是需要大量的数据操作。 When applied to text, you are generally looking for a lot of different terms inside the text.
当应用于文本时,通常会在文本内寻找许多不同的术语。
I have a natural bias toward doing these types of calculations in SQL. 对于在SQL中进行这些类型的计算,我有一种自然的偏见。 I would at least argue that MySQL is a reasonable environment for doing this.
我至少认为MySQL是执行此操作的合理环境。 Depending on the exact nature of the problem and the structure of your data, you might find that full text indexing is helpful.
根据问题的确切性质和数据的结构,您可能会发现全文索引会有所帮助。 I would be wary about working with a large corpus (many tens or hundreds of gigabytes) on the application side.
我会担心在应用程序端使用大型语料库(数十或数百GB)。 My book "Data Analysis Using SQL and Excel" has a chapter devoted to Naive Bayes and similar types of models.
我的《使用SQL和Excel进行数据分析》一书专门论述了朴素贝叶斯和类似类型的模型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.