使用Python和BeautifulSoup抓取網站數據后，CSV無法正確寫入

Question

從網站抓取數據后，編寫CSV文件時遇到問題。 我的目的是抓取在美國發現的高爾夫球場的名稱和地址列表。 我使用.get_text(separator=' ')作為地址，以刪除<Br>以破壞地址的文本，但是當寫入CSV時，我從893的交互中僅獲得三個條目。我該怎么辦因此我獲得了適量的已抓取數據，以及如何修復腳本以使其能夠正確抓取所有內容。

這是我的腳本：

import csv
import requests
from bs4 import BeautifulSoup

courses_list = []

for i in range(893): #893
    url="http://sites.garmin.com/clsearch/courses/search?course=&location=&country=US&state=&holes=&radius=&lang=en&search_submitted=1&per_page={}".format(i*20)
    r = requests.get(url)
    soup = BeautifulSoup(r.text)
    g_data2 = soup.find_all("div",{"class":"result"})
    #print g_data

    for item in g_data2:
        try:
            name = item.find_all("div",{"class":"name"})[0].text
        except:
            name=''
            print "No Name found!"
        try:
            address= item.find_all("div",{"class":"location"})[0].get_text(separator=' ')
            print address
        except:
            address=''
            print "No Address found!"

course=[name,address]
courses_list.append(course)

with open ('Garmin_GC.csv','a') as file:
     writer=csv.writer(file)
     for row in courses_list:
         writer.writerow([s.encode("utf-8") for s

Answer 1

如果那是您的縮進，那是錯誤的，您需要在循環中添加名稱和地址，這應該添加所有數據：

import csv
import requests
from bs4 import BeautifulSoup

courses_list = []
with open('Garmin_GC.csv', 'w') as file:
    for i in range(893):  #893
        url = "http://sites.garmin.com/clsearch/courses/search?course=&location=&country=US&state=&holes=&radius=&lang=en&search_submitted=1&per_page={}".format(
            i * 20)
        r = requests.get(url)
        soup = BeautifulSoup(r.text)
        g_data2 = soup.find_all("div", {"class": "result"})
        for item in g_data2:
            try:
                name = item.find_all("div", {"class": "name"})[0].text
            except IndexError::
                name = ''
                print "No Name found!"
            try:

                address = item.find_all("div", {"class": "location"})[0].get_text(separator=' ')
                print address
            except IndexError::
                address = ''
                print "No Address found!"
            course = [name, address]
            courses_list.append(course)


    writer = csv.writer(file)
    for row in courses_list:
        writer.writerow([s.encode("utf-8") for s in row])

您可以在循環外打開文件，並在完成后寫入一次，如果您不想將所有數據存儲在列表中，只需編寫每次迭代即可：

with open('Garmin_GC.csv', 'w') as file:
    writer = csv.writer(file)
    for i in range(3):  #893
        url = "http://sites.garmin.com/clsearch/courses/search?course=&location=&country=US&state=&holes=&radius=&lang=en&search_submitted=1&per_page={}".format(
            i * 20)
        r = requests.get(url)
        soup = BeautifulSoup(r.text)
        g_data2 = soup.find_all("div", {"class": "result"})
        for item in g_data2:
            try:
                name = item.find_all("div", {"class": "name"})[0].text
            except IndexError:
                name = ''
                print "No Name found!"
            try:    
                address = item.find_all("div", {"class": "location"})[0].get_text(separator=' ')
                print address
            except IndexError:
                address = ''
                print "No Address found!"
            writer.writerow([name.encode("utf-8"), address.encode("utf-8")])

如果您沒有名字或地址，那么您可能想在例外中添加一個continue ，如果您想忽略缺少這兩個或兩者之一的數據。

使用Python和BeautifulSoup抓取網站數據后，CSV無法正確寫入

問題描述

1 個解決方案

解決方案1
0 2015-07-08 18:00:47

使用Python和BeautifulSoup抓取網站數據后，CSV無法正確寫入

問題描述

1 個解決方案

解決方案1 0 2015-07-08 18:00:47

解決方案1
0 2015-07-08 18:00:47