[英]Extract text from a file object using .read()
I'm trying to read the source of a website with this code:我正在尝试使用以下代码阅读网站的源代码:
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
z.read()
print z
txt = open('music.txt','w')
txt.write(str(z))
txt.close()
for i in open('music.txt','r'):
if '''onclick="javascript:pageTracker._trackPageview('/clicks/''' in i:
print i
And all I get for the source code is:我得到的所有源代码是:
<addinfourl at 51561608L whose fp = <socket._fileobject object at 0x0000000002CCA480>>
It might be an error I don't know?这可能是一个我不知道的错误?
Does anyone know of a better way to do the job above without putting it into a text file first?有谁知道在不先将其放入文本文件的情况下完成上述工作的更好方法?
z
is a file object. z
是一个文件 object。 In fact your codes prints the object description.事实上,您的代码会打印 object 描述。 You need to put the result of z.read()
inside a variable (or print it directly).您需要将z.read()
的结果放入变量中(或直接打印)。
You should do你应该做
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
i = z.read()
print i
.read()
does not actually change the state of z
. .read()
实际上并没有改变z
的 state 。 Use z=z.read()
instead.请改用z=z.read()
。
z
is the file-like object. z
是类似文件的 object。 str(z)
just gives you the representation you're seeing. str(z)
只是给你你所看到的表示。
You need to keep the string (the contents of the file) that's returned by z.read()
.您需要保留z.read()
。
Better yet, just iterate over it directly:更好的是,直接迭代它:
import urllib2
z=urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search')
for i in z:
if '''onclick="javascript:pageTracker._trackPageview('/clicks/''' in i:
print i
I think you're missing what read
does.我认为你错过了read
的作用。 Try:尝试:
data = z.read()
print data
with open('music.txt','w') as txt:
txt.write(data)
with open('music.txt','w') as out:
out.write(urllib2.urlopen('http://skreemr.com/results.jsp?q=said+the+whale&search=SkreemR+Search').read()
But this is just the html for the page, you will need to parse it yourself using beautiful soup or lxml但这只是页面的 html,您需要使用漂亮的汤或 lxml 自己解析它
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.