Im making a model to predict wether a piece of news is fake or real based on the headline. I made a bag of words with my headlines, however, if I wanted the user to input their headline, a new bag of words is formed, with only the words from the users headline. I want the user input to make a prediction based on the existing bag of words. How would I go about doing this? Here is some of my code:
headline_bow = CountVectorizer()
b = headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences) #here is the bag of words
dictionary = headline_bow.get_feature_names()
#TRYING TO USE NEW DATA (NEW HEADLINES THAT WERE NOT PART OF THE DATASET
new_data = []
test = ['est', "test", "test"]
input = input('Type headline here')
new_data.append(word_tokenize(input))
print(new_data)
new_data2 = [" ".join(x) for x in new_data]
new_vecto = headline_bow.fit(new_data2)
new_vector = headline_bow.transform(new_data2)
print(new_vector) #this part makes a separate bag of words for some reason instead of using the existing bag of words
Seperate bag of words:
(0, 0) 1
(0, 1) 1
(0, 2) 1
existing bag of words:
(0, 765) 1
(0, 1789) 1
(0, 2227) 1
(0, 2309) 1
(0, 2508) 1
(0, 3244) 1
(0, 3276) 1
(0, 4970) 1
(0, 5151) 1
Just don't call fit
on the new data. Remove this line:
new_vecto = headline_bow.fit(new_data2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.