简体   繁体   English

在多标签分类Python中预处理数据

[英]Preprocessing data in Multi-label classification Python

My dataset structure:我的数据集结构:

Text: 'Good service, nice view, location'
Tag: '{SERVICE#GENERAL, positive}, {HOTEL#GENERAL, positive}, {LOCATI
ON#GENERAL, positive}'

And the point here is that I don't know how can I structure my data frame.这里的重点是我不知道如何构建我的数据框。 If you have any recommendations, these will be really nice to me.如果你有任何建议,这些对我来说真的很好。 Thank you.谢谢你。

Separate adjectives (good, bad, etc) from the hotel attributes (service, view, location).将形容词(好、坏等)与酒店属性(服务、景观、位置)分开。 You can start from creating a custom dictionary and automatically detect and leverage new words as categories.您可以从创建自定义词典开始,然后自动检测和利用新词作为类别。 You could use some name entity recognition to do so, here some articles:您可以使用一些名称实体识别来做到这一点,这里有一些文章:

https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175 https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization -of-resumes-5248a75de175

Personally I have used the standford one, pretty cool我个人用过standford的,很酷

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM