简体   繁体   English

可扩展的自动电邮分类服务

[英]Scalable automatic email classification service

We're currently working on an application that enable the user to register with one or more email account so that his emails could be automatically classified. 我们目前正在开发一个应用程序,使用户能够注册一个或多个电子邮件帐户,以便他的电子邮件可以自动分类。 The front-end has been implement using Ruby, however, the back-end (email classifier) is written in java and uses the WEKA API. 前端已经使用Ruby实现,但后端(电子邮件分类器)是用java编写的,并使用WEKA API。 The question is how could we integrate the front-end (Web interface written in Ruby) with the back-end (email classifier written in java) in a scalable way (handling large number of users simultaneously.. 问题是我们如何以可扩展的方式将前端(用Ruby编写的Web界面)与后端(用java编写的电子邮件分类器)集成(同时处理大量用户)。

I am not sure what is an email classifier. 我不确定什么是电子邮件分类器。 But in any similar problem, the best solution I recommend creating a RESTful API for your java service. 但在任何类似的问题中,我建议为您的Java服务创建RESTful API的最佳解决方案。 This can be done very elegantly with the right tools. 使用正确的工具可以非常优雅地完成这项工作。 The API should be over HTTP which returns JSON. API应该通过HTTP返回JSON。 Use a library like Jackson that serialize to JSON. 使用像Jackson这样序列化为JSON的库。

On the ruby side, you an easily parse that JSON and deserialize. 在ruby方面,您可以轻松解析JSON并反序列化。

This is a very scalable solution because HTTP calls are stateless and already scalable. 这是一个非常可扩展的解决方案,因为HTTP调用是无状态的并且已经可扩展 Thread is used and thrown away. 线程被使用并丢弃。 If you need more power, then just add more machines. 如果您需要更多电量,那么只需添加更多机器。

The Rails app can also start caching some calls. Rails应用程序也可以开始缓存一些调用。 But that is premature optimization. 但那是不成熟的优化。

If there is no logic and only a common database, then just share that common database between the two apps. 如果没有逻辑并且只有一个公共数据库,那么只需在两个应用程序之间共享该公共数据库。 But it sounds like the Java app needs to do some work. 但听起来Java应用程序需要做一些工作。 This is a common approach with APIs. 这是API的常用方法。 It also doesn't limit you to Ruby. 它也不限制你使用Ruby。 You can create JSONP service for AJAX or any other client that can understand JSON. 您可以为AJAX或任何其他可以理解JSON的客户端创建JSONP服务。

If you want a new email alert just reverse which RESTful API you are exposing. 如果您想要新的电子邮件警报,请反转您正在公开的RESTful API。 Instead of exposing the Java app as a RESTful API, expose the Rails app API. 不要将Java应用程序公开为RESTful API,而是公开Rails应用程序API。 For example /user/ID/newmail . 例如/user/ID/newmail

The Java app would then call the Rails app when a new email arrives. 然后,当新电子邮件到达时,Java应用程序将调用Rails应用程序。

Btw: BTW:

How did you implement a scalable system in Java for checking 1000s of email accounts? 您是如何在Java中实现可扩展系统来检查1000个电子邮件帐户的?

As the amount of data you're using to train the classifier with grows, you may find that you might want to use ensemble algorithms (where a group of n nodes form the ensemble) and split the training data up over each of the n nodes. 随着您用于训练分类器的数据量的增长,您可能会发现您可能想要使用整体算法(其中一组n个节点形成整体)并将训练数据分散到n个节点中的每个节点上。

To classify a new datapoint, you can use a voting system where each of the n nodes gets to "vote" on what the new datapoint should be classified as. 要对新数据点进行分类,您可以使用投票系统,其中n个节点中的每个节点都对新数据点应归类为“投票”。 The classification with the most votes wins. 得票最多的分类获胜。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM