使用python和漂亮的soup4在网页抓取后重复数据

Question

I am trying to scrape data from the Garmin site for golf.我正在尝试从 Garmin 网站上抓取数据用于高尔夫。 I would want to get the name of the golf course and the address but I after running the script.我想得到高尔夫球场的名称和地址，但我在运行脚本后。 I have noticed that my codes just repeats the first page data over and over again.我注意到我的代码只是一遍又一遍地重复第一页数据。 I also noticed that the page numbers on the website do not start at 1 but at 10 for the second page.我还注意到网站上的页码不是从 1 开始，而是从第二页的 10 开始。 How do I go about extracting data from this website and getting all and instead of a repeat of just the first page.我如何从该网站提取数据并获取所有数据，而不是仅重复第一页。

import csv
import codecs
import requests 
from bs4 import BeautifulSoup


courses_list= []
for i in range(10):
    url = "http://sites.garmin.com/clsearch/courses?browse=1&country=US&lang=en&per_page={}".format(i)
    r = requests.get(url)

    soup = BeautifulSoup(r.content)

    g_data2=soup.find_all("div",{"class":"result"})

    for item in g_data2:
     try:
        name= item.contents[3].find_all("div",{"class":"name"})[0].text
        print name
     except:
        name=''
    try:
        address= item.contents[3].find_all("div",{"class":"location"})[0].text
    except:
        address=''


    course=[name,address]
    courses_list.append(course)


with open ('G_Final.csv','a') as file:
    writer=csv.writer(file)
    for row in courses_list:
        writer.writerow([s.encode("utf-8") for s in row])

Answer 1

You discovered the problem.你发现了问题。

Then change然后改变

url = "http://...?browse=1&country=US&lang=en&per_page={}".format(i)

to到

url = "http://...?browse=1&country=US&lang=en&per_page={}".format(i*20)

Answer 2

Just change this, to this:只需将其更改为：

for i in range(0, 10): url = "http://sites.garmin.com/clsearch/courses?browse=1&country=US&lang=en&per_page={i}"对于 i in range(0, 10): url = "http://sites.garmin.com/clsearch/courses?browse=1&country=US&lang=en&per_page={i}"

使用python和漂亮的soup4在网页抓取后重复数据

问题描述

2 个解决方案

解决方案1
1 2015-06-29 16:29:48

解决方案2
-1 2021-02-12 19:56:51

使用python和漂亮的soup4在网页抓取后重复数据

问题描述

2 个解决方案

解决方案1 1 2015-06-29 16:29:48

解决方案2 -1 2021-02-12 19:56:51

解决方案1
1 2015-06-29 16:29:48

解决方案2
-1 2021-02-12 19:56:51