简体   繁体   English

如何使用nlp对我拥有的数据集将评论分为好与坏?

[英]how do i classify the reviews as good and bad using nlp for the dataset that i have?

the tsv file of the data Things i have tried on jupyter 我在jupyter上尝试过 的数据 事物 的tsv文件

I have this data of customer reviews for a clothing e commerce store, i am learning 'nlp' using python on 'jupyter-notebook' and i wanted to learn how to classify the 'Review Text' column's reviews as good or bad using nlp. 我有服装电子商务商店的客户评论数据,我正在'jupyter-notebook'上使用python学习'nlp',我想学习如何使用nlp将'Review Text'栏的评论分为好还是坏。

  • List item 项目清单

i have imported the file and separated with a delimiter cleaned the 'Review Text' column data tokenisation of the data such as converting to lower case , stopwords emitting, stemmering and split. 我已导入文件,并用定界符分隔,清除了数据的“审阅文本”列数据标记化,如转换为小写字母,停用词发出,词干和拆分。

please do help me in this project. 请在这个项目上帮助我。 i have tried doing it by learning from a few blogs but it hasn't helped a lot. 我尝试通过从一些博客中学习来做到这一点,但并没有太大帮助。

By seeing your dataset I am assuming that you can take the Review Text column as independent variable and "Positive feedback " column as the dependent one which consists of 0's and 1's Step 1: Doing the stemming process for "Review text" column Step 2 :spliting the words and converting to lower and removing all regular expressions Step 3 : Use Count vectorizer Step 4 : Next train_test_split in x= "Review text" y = "Positive feedback" step 5 : Next use any classifier for classifying the words into 0 and 1 通过查看您的数据集,我假设您可以将Review Text列作为自变量,而将“ Positive feedback”列作为从属变量,该变量由0和1组成。步骤1:对“ Review text”列进行词干处理步骤2:拆分单词并将其转换为小写并删除所有正则表达式步骤3:使用Count矢量化器步骤4:接下来,在x =“ Review text” y =“ Positive feedback”中进行train_test_split步骤5:接下来,使用任何分类器将单词分类为0和1个

For further Guidance once see this link https://www.kaggle.com/apekshakom/sentiment-analysis-of-restaurant-reviews 如需进一步的指导,请参阅此链接https://www.kaggle.com/apekshakom/sentiment-analysis-of-restaurant-reviews

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM