简体繁体 English

如何使用nlp对我拥有的数据集将评论分为好与坏？

[英]how do i classify the reviews as good and bad using nlp for the dataset that i have?

原文 2019-11-30 06:43:16 5 1 python/ nlp/ classification/ data-analysis

the tsv file of the data Things i have tried on jupyter 我在jupyter上尝试过的数据事物的tsv文件

I have this data of customer reviews for a clothing e commerce store, i am learning 'nlp' using python on 'jupyter-notebook' and i wanted to learn how to classify the 'Review Text' column's reviews as good or bad using nlp. 我有服装电子商务商店的客户评论数据，我正在'jupyter-notebook'上使用python学习'nlp'，我想学习如何使用nlp将'Review Text'栏的评论分为好还是坏。

List item 项目清单

i have imported the file and separated with a delimiter cleaned the 'Review Text' column data tokenisation of the data such as converting to lower case , stopwords emitting, stemmering and split. 我已导入文件，并用定界符分隔，清除了数据的“审阅文本”列数据标记化，如转换为小写字母，停用词发出，词干和拆分。

please do help me in this project. 请在这个项目上帮助我。 i have tried doing it by learning from a few blogs but it hasn't helped a lot. 我尝试通过从一些博客中学习来做到这一点，但并没有太大帮助。

1 个解决方案

By seeing your dataset I am assuming that you can take the Review Text column as independent variable and "Positive feedback " column as the dependent one which consists of 0's and 1's Step 1: Doing the stemming process for "Review text" column Step 2 :spliting the words and converting to lower and removing all regular expressions Step 3 : Use Count vectorizer Step 4 : Next train_test_split in x= "Review text" y = "Positive feedback" step 5 : Next use any classifier for classifying the words into 0 and 1 通过查看您的数据集，我假设您可以将Review Text列作为自变量，而将“ Positive feedback”列作为从属变量，该变量由0和1组成。步骤1：对“ Review text”列进行词干处理步骤2：拆分单词并将其转换为小写并删除所有正则表达式步骤3：使用Count矢量化器步骤4：接下来，在x =“ Review text” y =“ Positive feedback”中进行train_test_split步骤5：接下来，使用任何分类器将单词分类为0和1个

For further Guidance once see this link https://www.kaggle.com/apekshakom/sentiment-analysis-of-restaurant-reviews 如需进一步的指导，请参阅此链接https://www.kaggle.com/apekshakom/sentiment-analysis-of-restaurant-reviews