简体   繁体   English

我想实现一个机器学习或深度学习 model 用于文本分类(100类)

[英]I want to implement a machine learning or deep learning model for text classification (100 classes)

I have a dataset that is similar to the one where we have movie plots and their genres.我有一个数据集,类似于我们拥有电影情节及其类型的数据集。 The number of classes is around 100. What algorithm should I choose for this 100 class classification?类的数量在100左右。我应该为这100个class分类选择什么算法? The classification is multi-label because 1 movie can have multiple genres Please recommend anyone from the following.分类是多标签的,因为一部电影可以有多种类型请从以下推荐任何人。 You are free to suggest any other model if you want to.如果您愿意,您可以随意推荐任何其他 model。

1.Naive Bayesian
2.Neural networks
3.SVM
4.Random forest
5.k nearest neighbours

It would be useful if you also give the necessary library in python如果您还在 python 中提供必要的库,这将很有用

An important step in machine learning engineering consists of properly inspecting the data.机器学习工程的一个重要步骤是正确检查数据。 Herby you get some insight that determines what algorithm to choose. Herby 你会得到一些决定选择什么算法的洞察力。 Sometimes, you might try out more than one algorithm and compare the models, in order to be sure, that you tried your best on the data.有时,您可能会尝试不止一种算法并比较模型,以确保您在数据上尽了最大努力。

Since you did not disclose your data, I can only give you the following advice: If your data is "easy", meaning that you need only little features and a slight combination of them to solve the task, use Naive Bayes or k-nearest neighbors.由于您没有透露您的数据,我只能给您以下建议:如果您的数据“简单”,即您只需要很少的特征和它们的轻微组合即可解决任务,请使用朴素贝叶斯或 k-nearest邻居。 If your data is "medium" hard, then use Random Forest or SVM.如果您的数据“中等”难度,则使用随机森林或 SVM。 If solving the task requires a very complicated decision boundary combining many dimensions of the features in a non-linear fashion, choose a Neural Network architecture.如果解决任务需要一个非常复杂的决策边界,以非线性方式组合特征的许多维度,请选择神经网络架构。

I suggest you use python and the scikit-learn package for SVM or Random forest or k-NN.我建议您将 python 和 scikit-learn package 用于 SVM 或随机森林或 k-NN。 For Neural Networks, use keras.对于神经网络,使用 keras。

I am sorry that I can not give you THE recipe you might expect for solving your problem.很抱歉我不能给你解决问题的方法。 Your question is posed really broad.你的问题提出的非常广泛。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM