简体   繁体   中英

how to make user input pull from existing bag of words?

Im making a model to predict wether a piece of news is fake or real based on the headline. I made a bag of words with my headlines, however, if I wanted the user to input their headline, a new bag of words is formed, with only the words from the users headline. I want the user input to make a prediction based on the existing bag of words. How would I go about doing this? Here is some of my code:

headline_bow = CountVectorizer()
b = headline_bow.fit(lower_sentences)
a = headline_bow.transform(lower_sentences) #here is the bag of words
dictionary = headline_bow.get_feature_names() 



#TRYING TO USE NEW DATA (NEW HEADLINES THAT WERE NOT PART OF THE DATASET
new_data = []
test = ['est', "test", "test"]

input = input('Type headline here')
new_data.append(word_tokenize(input))
print(new_data)

new_data2 = [" ".join(x) for x in new_data]

new_vecto = headline_bow.fit(new_data2)
new_vector = headline_bow.transform(new_data2)

print(new_vector) #this part makes a separate bag of words for some reason instead of using the existing bag of words 

Seperate bag of words:

  (0, 0)    1
  (0, 1)    1
  (0, 2)    1

existing bag of words:

  (0, 765)  1
  (0, 1789) 1
  (0, 2227) 1
  (0, 2309) 1
  (0, 2508) 1
  (0, 3244) 1
  (0, 3276) 1
  (0, 4970) 1
  (0, 5151) 1

Just don't call fit on the new data. Remove this line:

new_vecto = headline_bow.fit(new_data2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM