简体   繁体   中英

Preprocessing data in Multi-label classification Python

My dataset structure:

Text: 'Good service, nice view, location'
Tag: '{SERVICE#GENERAL, positive}, {HOTEL#GENERAL, positive}, {LOCATI
ON#GENERAL, positive}'

And the point here is that I don't know how can I structure my data frame. If you have any recommendations, these will be really nice to me. Thank you.

Separate adjectives (good, bad, etc) from the hotel attributes (service, view, location). You can start from creating a custom dictionary and automatically detect and leverage new words as categories. You could use some name entity recognition to do so, here some articles:

https://towardsdatascience.com/named-entity-recognition-with-nltk-and-spacy-8c4a7d88e7da https://towardsdatascience.com/a-review-of-named-entity-recognition-ner-using-automatic-summarization-of-resumes-5248a75de175

Personally I have used the standford one, pretty cool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM