简体繁体中英

which kind of features are good to extract from text for author identification calssifying

原文 2020-10-29 09:30:03 4 1 python/ pandas/ classification

I want to classify texts as their authors for the author identification task...
the features are may:
the author's text length
or the authors text lexical features... is there anybody to help that which kind of features can help to improve classification results? the sample data frame I gathered is like this...

text long is 4 sentences, and I have 18 authors at least, about classification, this task is my thesis and I can not "just" apply classification on text, the goal is to apply classification into features that are extracted from text... I want to know which kind of features can help me to improve classification accuracy...( with both mo approaches or neural networks

1 answers

How long are your texts? You can try deriving tf-idfs for each document, and then perform a knn search over your dataset. A more sophisticated way it's to featurize your texts with a neural network, and then perform a knn by using those vectors. If your dataset is big enough, there are not so many authors and there are several texts for each author, you could try to fine-tune a neural network to classify your texts. But I would go for the knn over the neural net features.

Extract text features from dataframe

How to extract the text from Google features in Python?

BeautifulSoup, trying to extract text from anchor tags that contain author names

Extract text from an image with good accuracy

Extract features from text file and train them to classifier

Text identification

How to extract features from FFT?

how do i pick the second div from the code without any kind of identification?

Extract the features from Doc2Vec in Python

Is there any function to extract the text which has a specific heading from pdf

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Extract text features from dataframe How to extract the text from Google features in Python? BeautifulSoup, trying to extract text from anchor tags that contain author names Extract text from an image with good accuracy Extract features from text file and train them to classifier Text identification How to extract features from FFT? how do i pick the second div from the code without any kind of identification? Extract the features from Doc2Vec in Python Is there any function to extract the text which has a specific heading from pdf

Related Tags

which kind of features are good to extract from text for author identification calssifying

Question

1 answers

solution1 0 2020-10-29 10:06:08

solution1
0 2020-10-29 10:06:08