how to make user input pull from existing bag of words?

Question

Im making a model to predict wether a piece of news is fake or real based on the headline. I made a bag of words with my headlines, however, if I wanted the user to input their headline, a new bag of words is formed, with only the words from the users headline. I want the user input to make a prediction based on the existing bag of words. How would I go about doing this? Here is some of my code:

headline_bow = CountVectorizer()
b = headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences) #here is the bag of words
dictionary = headline_bow.get_feature_names() 



#TRYING TO USE NEW DATA (NEW HEADLINES THAT WERE NOT PART OF THE DATASET
new_data = []
test = ['est', "test", "test"]

input = input('Type headline here')
new_data.append(word_tokenize(input))
print(new_data)

new_data2 = [" ".join(x) for x in new_data]

new_vecto = headline_bow.fit(new_data2)
new_vector = headline_bow.transform(new_data2)

print(new_vector) #this part makes a separate bag of words for some reason instead of using the existing bag of words

Seperate bag of words:

  (0, 0)    1
  (0, 1)    1
  (0, 2)    1

existing bag of words:

  (0, 765)  1
  (0, 1789) 1
  (0, 2227) 1
  (0, 2309) 1
  (0, 2508) 1
  (0, 3244) 1
  (0, 3276) 1
  (0, 4970) 1
  (0, 5151) 1

Answer 1

Just don't call fit on the new data. Remove this line:

new_vecto = headline_bow.fit(new_data2)

how to make user input pull from existing bag of words?

Question

1 answers

solution1
0 ACCPTED 2020-04-15 00:06:55

how to make user input pull from existing bag of words?

Question

1 answers

solution1 0 ACCPTED 2020-04-15 00:06:55

solution1
0 ACCPTED 2020-04-15 00:06:55