如何使用python抓取多页网站并将数据导出到.csv文件？

Question

I would like to scrape the following website using python and need to export scraped data into a CSV file: 我想使用python抓取以下网站，并且需要将抓取的数据导出到CSV文件中：

http://www.swisswine.ch/en/producer?search=&& http://www.swisswine.ch/en/producer?search=&&

This website consist of 154 pages to relevant search. 该网站包括154页相关搜索。 I need to call every pages and want to scrape data but my script couldn't call next pages continuously. 我需要调用每个页面并想抓取数据，但是我的脚本无法连续调用下一页。 It only scrape one page data. 只刮一页数据。

Here I assign value i<153 therefore this script run only for the 154th page and gave me 10 data. 在这里，我分配的值是i <153，因此该脚本仅在第154页上运行，并提供了10个数据。 I need data from 1st to 154th page 我需要第一页到第154页的数据

How can I scrape entire data from all page by once I run the script and also how to export data as CSV file?? 一旦运行脚本，如何从所有页面抓取全部数据，以及如何将数据导出为CSV文件？

my script is as follows 我的脚本如下

import csv
import requests
from bs4 import BeautifulSoup
i = 0
while i < 153:       
     url = ("http://www.swisswine.ch/en/producer?search=&&&page=" + str(i))
     r = requests.get(url)
     i=+1
     r.content

soup = BeautifulSoup(r.content)
print (soup.prettify())


g_data = soup.find_all("ul", {"class": "contact-information"})
for item in g_data:
      print(item.text)

Answer 1

You should put your HTML parsing code to under the loop as well. 您还应该将HTML解析代码放到循环下面 。 And you are not incrementing the i variable correctly (thanks @MattDMo): 而且您没有正确地递增i变量（感谢@MattDMo）：

import csv
import requests
from bs4 import BeautifulSoup

i = 0
while i < 153:       
     url = ("http://www.swisswine.ch/en/producer?search=&&&page=" + str(i))
     r = requests.get(url)
     i += 1 

    soup = BeautifulSoup(r.content)
    print (soup.prettify())

    g_data = soup.find_all("ul", {"class": "contact-information"})
    for item in g_data:
          print(item.text)

I would also improve the following: 我还将改进以下内容：

use requests.Session() to maintain a web-scraping session, which will also bring a performance boost: 使用requests.Session()维护网络抓取会话，这也将带来性能提升：

if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase 如果您要向同一主机发出多个请求，则基础TCP连接将被重用，这可能会导致性能显着提高
be explicit about an underlying parser for BeautifulSoup : 明确说明BeautifulSoup的基础解析器：
```
 soup = BeautifulSoup(r.content, "html.parser") # or "lxml", or "html5lib" 
```

如何使用python抓取多页网站并将数据导出到.csv文件？

问题描述

1 个解决方案

解决方案1
1 2016-07-24 14:53:26

如何使用python抓取多页网站并将数据导出到.csv文件？

问题描述

1 个解决方案

解决方案1 1 2016-07-24 14:53:26

解决方案1
1 2016-07-24 14:53:26