[英]UnicodeEncodeError: Scraping data using Python and beautifulsoup4
我正在嘗試從PGA網站上獲取數據,以獲取美國所有高爾夫球場的列表。 我想抓取數據並將其輸入到CSV文件中。 我的問題是運行腳本后出現此錯誤。 任何人都可以幫助解決此錯誤,以及如何提取數據?
這是錯誤消息:
文件“ /Users/AGB/Final_PGA2.py”,第44行
writer.writerow(行)UnicodeEncodeError:'ascii'編解碼器無法在位置35編碼字符u'\\ u201c':序數不在范圍內(128)
下面的腳本;
import csv
import requests
from bs4 import BeautifulSoup
courses_list = []
for i in range(906): # Number of pages plus one
url = "http://www.pga.com/golf-courses/search?page={}&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0".format(i)
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data2=soup.find_all("div",{"class":"views-field-nothing"})
for item in g_data2:
try:
name = item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
print name
except:
name=''
try:
address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
except:
address1=''
try:
address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
except:
address2=''
try:
website=item.contents[1].find_all("div",{"class":"views-field-website"})[0].text
except:
website=''
try:
Phonenumber=item.contents[1].find_all("div",{"class":"views-field-work-phone"})[0].text
except:
Phonenumber=''
course=[name,address1,address2,website,Phonenumber]
courses_list.append(course)
with open ('PGA_Final.csv','a') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow(row)
您不應該在Python 3上收到錯誤。這里的代碼示例修復了代碼中一些不相關的問題。 它解析給定網頁上的指定字段,並將其另存為csv:
#!/usr/bin/env python3
import csv
from urllib.request import urlopen
import bs4 # $ pip install beautifulsoup4
page = 905
url = ("http://www.pga.com/golf-courses/search?page=" + str(page) +
"&searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0"
"&course_type=both&has_events=0")
with urlopen(url) as response:
field_content = bs4.SoupStrainer('div', 'views-field-nothing')
soup = bs4.BeautifulSoup(response, parse_only=field_content)
fields = [bs4.SoupStrainer('div', 'views-field-' + suffix)
for suffix in ['title', 'address', 'city-state-zip', 'website', 'work-phone']]
def get_text(tag, default=''):
return tag.get_text().strip() if tag is not None else default
with open('pga.csv', 'w', newline='') as output_file:
writer = csv.writer(output_file)
for div in soup.find_all(field_content):
writer.writerow([get_text(div.find(field)) for field in fields])
with open ('PGA_Final.csv','a') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow(row)
更改為:
with open ('PGA_Final.csv','a') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow(row.encode('utf-8'))
要么:
import codecs
....
with codecs.open('PGA_Final.csv','a', encoding='utf-8') as file:
writer=csv.writer(file)
for row in courses_list:
writer.writerow(row)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.