简体   繁体   中英

How to make a bag of words using split method from a text file in python

I am trying to learn TFIDF. But I coudnt bag the words from file.


docA = open("/home/user/Desktop/da/doca","r")
bowA = docA.split(" ")


Traceback (most recent call last)
<ipython-input-32-06e07f9dd975> in <module>
----> 1 bowA = docA.split(" ")

AttributeError: '_io.TextIOWrapper' object has no attribute 'split'`
Can anyone help me solve this?

I assume that you meant this:

docA = open("/home/user/Desktop/da/doca","r")
# print(docA.read())
bowA = docA.read().split(" ") # or just split() will do

When you call read() the read cursor reads the entire file, leaving the read-cursor at the end. So calling read() again will return empty string. Hence if you would like to print the content, you can assign the content to a variable, print it and use it as you wish:

docA = open("/home/user/Desktop/da/doca","r")
data = docA.read()
bowA = data.split()

Or simply

with open("/home/user/Desktop/da/doca","r") as docA:
    data = docA.read()
bowA = data.split()

You want to use the returned string instead of the file handle:

docA = open("/home/user/Desktop/da/doca","r")
document_string = docA.read()
bowA = document_string.split()

You can just call split , by default it splits on whitespace

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM