How to classifying and count the number of words in Python

Question

I have a dataset of Comments from twitter(eg 10 instances). I want to classify and count the similar words using Scikit-learn Python as output as following:

**Dataset:** 
  comment_text 
 r u cmng or u not cmng   
I am fine, r u fine  
my frnd is gr8, wll dn.  
 we r nt going tday   
I have a fever.

It should be shown like this output

 Words    Count

u         3
r         3
i         2
cmng      2
fine,     1
wll       1
have      1
fever.    1
not       1
tday      1
my        1
we        1
a         1
or        1
nt        1
going     1
fine      1
dn.       1
gr8,      1
frnd      1
am        1
is        1
dtype: int64

i use this code but is shows wrong output

    text = train_dataset_male['comment_text']
    print(text)
    vectorizer = TfidfVectorizer()
    # tokenize and build vocab
    vectorizer.fit(text)
    # summarize
    print(vectorizer.vocabulary_)
    print(vectorizer.idf_)
    # encode document
    vector = vectorizer.transform([text[0]])
    # summarize encoded vector
    print(vector.shape)
    print(vector.toarray())

Answer 1

Python has a neat module in the standard library called "collections" for this type of thing. In it you can use the Counter which ends up being a dictionary that keeps track of individual items and counts the number of times they appear in an iterable(list, tuple, etc)

so...

from collections import Counter

text_counter = Counter(dataset)
# to access the times the word "you" is seen
text_counter.get("you")

How to classifying and count the number of words in Python

Question

1 answers

solution1
0 ACCPTED 2019-11-09 17:26:38

How to classifying and count the number of words in Python

Question

1 answers

solution1 0 ACCPTED 2019-11-09 17:26:38

solution1
0 ACCPTED 2019-11-09 17:26:38