[英]Using string array in Python 2.7
I'm trying to do some scraping using Python 2.7.2. 我正在尝试使用Python 2.7.2进行抓取。 I've just started with Python and unfortunately it is not as intuitive as I thought it will be.
我刚刚开始使用Python,但是不幸的是它并不像我想象的那样直观。 I try to collect all specific -s from all pages.
我尝试从所有页面收集所有特定的-s。 I don't know how to accumulate results from all pages in string array.
我不知道如何从字符串数组的所有页面中累积结果。 So far I'm getting results from 1 page only.
到目前为止,我仅从1页获得结果。 I know that this is a super easy question for people who write in python.
我知道这对于使用python编写的人来说是一个超级容易的问题。 So please help me.
所以请帮帮我。 Here is the code:
这是代码:
import urllib
import re
j=1
while j<10:
url="http://www.site.com/search?page=" + str(j) + "&query=keyword"
print url
htmlfile=urllib.urlopen(url)
htmltext=htmlfile.read()
regex='<span class="class33">(.+?)</span>'
pattern=re.compile(regex)
spans=re.findall(pattern,htmltext)
#spans[j] insttead of spans doesn't work
#spans.append(spans) doesn't work
j+=1
i=0
while i<len(spans):
print spans[i]
i+=1
for
loop for
循环之外 outside the for
loop init s
to the empty list 外面
for
环路初始化s
为空列表
s = []
inside the for
loop 在
for
循环内
s.extend(re.findall(pattern, htmltext))
If you prefer s += re.findall(pattern, htmltext)
will do the same 如果您喜欢
s += re.findall(pattern, htmltext)
会做同样的事情
Change 更改
spans=re.findall(pattern,htmltext)
to 至
spans.extend(re.findall(pattern,htmltext))
I'd also change your loop syntax a bit 我还要稍微修改一下循环语法
import urllib
import re
spans = []
for j in range(1,11):
url="http://www.site.com/search?page=" + str(j) + "&query=keyword"
print url
htmlfile=urllib.urlopen(url)
htmltext=htmlfile.read()
regex='<span class="class33">(.+?)</span>'
pattern=re.compile(regex)
spans.extend(re.findall(pattern,htmltext))
for span in spans:
print span
Before your loop, define spans: 在循环之前,定义范围:
spans = []
Then in your loop: 然后在您的循环中:
spans.extend(re.findall(pattern,htmltext))
The findall method will return a list. findall方法将返回一个列表。 You want to extend the spans list with the new spans on each iteration.
您想在每次迭代中使用新的跨度扩展跨度列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.