[英]How to make a bag of words using split method from a text file in python
I am trying to learn TFIDF.我正在尝试学习 TFIDF。 But I coudnt bag the words from file.
但是我无法从文件中提取单词。
code:代码:
docA = open("/home/user/Desktop/da/doca","r")
print(docA.read())
bowA = docA.split(" ")
error:错误:
AttributeError
Traceback (most recent call last)
<ipython-input-32-06e07f9dd975> in <module>
----> 1 bowA = docA.split(" ")
AttributeError: '_io.TextIOWrapper' object has no attribute 'split'`
Can anyone help me solve this?
I assume that you meant this:我假设你的意思是:
docA = open("/home/user/Desktop/da/doca","r")
# print(docA.read())
bowA = docA.read().split(" ") # or just split() will do
docA.close()
When you call read()
the read cursor reads the entire file, leaving the read-cursor at the end.当您调用
read()
,读取游标读取整个文件,将读取游标留在最后。 So calling read()
again will return empty string.所以再次调用
read()
将返回空字符串。 Hence if you would like to print the content, you can assign the content to a variable, print it and use it as you wish:因此,如果您想打印内容,您可以将内容分配给一个变量,打印它并根据需要使用它:
docA = open("/home/user/Desktop/da/doca","r")
data = docA.read()
print(data)
bowA = data.split()
docA.close()
Or simply或者干脆
with open("/home/user/Desktop/da/doca","r") as docA:
data = docA.read()
print(data)
bowA = data.split()
You want to use the returned string instead of the file handle:您想使用返回的字符串而不是文件句柄:
docA = open("/home/user/Desktop/da/doca","r")
document_string = docA.read()
bowA = document_string.split()
You can just call split
, by default it splits on whitespace您可以调用
split
,默认情况下它会在空白处拆分
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.