简体   繁体   English

使用python和Beautifulsoup4编写和保存CSV文件以刮取数据

[英]writing and saving CSV file from scraping data using python and Beautifulsoup4

I am trying to scrape data from the PGA.com website to get a table of all of the golf courses in the United States. 我正试图从PGA.com网站上获取数据,以获得美国所有高尔夫球场的表格。 In my CSV table I want to include the Name of the golf course ,Address ,Ownership ,Website , Phone number. 在我的CSV表格中,我想要包括高尔夫球场的名称,地址,所有权,网站,电话号码。 With this data I would like to geocode it and place into a map and have a local copy on my computer 有了这些数据,我想对其进行地理编码并放入地图并在我的计算机上安装本地副本

I utilized Python and Beautiful Soup4 to extract my data. 我利用Python和Beautiful Soup4来提取我的数据。 I have reached as far to extract the data from the website but I am having difficulty on writing the script to export the data into a CSV file displaying the parameters I need. 我已经达到了从网站提取数据的目的,但是我在编写脚本以将数据导出到显示我需要的参数的CSV文件时遇到了困难。

Attached below is my script. 以下是我的剧本。 I need help on creating code that will transfer my extracted code into a CSV file and how to save it into my desktop. 我需要帮助创建代码,将我提取的代码转换为CSV文件以及如何将其保存到桌面。

Here is my script below: 这是我的脚本如下:

import csv
import requests 
from bs4 import BeautifulSoup
url = "http://www.pga.com/golf-courses/search?searchbox=Course+Name&searchbox_zip=ZIP&distance=50&price_range=0&course_type=both&has_events=0"
r = requests.get(url)

soup = BeautifulSoup(r.content)

g_data1=soup.find_all("div",{"class":"views-field-nothing-1"})
g_data2=soup.find_all("div",{"class":"views-field-nothing"})


for item in g_data1:
     try:
          print item.contents[1].find_all("div",{"class":"views-field-counter"})[0].text
     except:
          pass  
     try:
          print item.contents[1].find_all("div",{"class":"views-field-course-type"})[0].text
     except:
          pass

for item in g_data2:
   try:
      print item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      pass
   try:
      print item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      pass

This is what I currently get when I run the script. 这是我运行脚本时目前获得的。 I want to take this data and make into a CSV table for geocoding later. 我想将这些数据转换成CSV表,以便以后进行地理编码。

1801 Merrimac Trl
Williamsburg, Virginia 23185-5905

12551 Glades Rd
Boca Raton, Florida 33498-6830
Preserve Golf Club 
13601 SW 115th Ave
Dunnellon, Florida 34432-5621
1000 Acres Ranch Resort 
465 Warrensburg Rd
Stony Creek, New York 12878-1613
1757 Golf Club 
45120 Waxpool Rd
Dulles, Virginia 20166-6923
27 Pines Golf Course 
5611 Silverdale Rd
Sturgeon Bay, Wisconsin 54235-8308
3 Creek Ranch Golf Club 
2625 S Park Loop Rd
Jackson, Wyoming 83001-9473
3 Lakes Golf Course 
6700 Saltsburg Rd
Pittsburgh, Pennsylvania 15235-2130
3 Par At Four Points 
8110 Aero Dr
San Diego, California 92123-1715
3 Parks Fairways 
3841 N Florence Blvd
Florence, Arizona 85132
3-30 Golf & Country Club 
101 Country Club Lane
Lowden, Iowa 52255
401 Par Golf 
5715 Fayetteville Rd
Raleigh, North Carolina 27603-4525
93 Golf Ranch 
406 E 200 S
Jerome, Idaho 83338-6731
A 1 Golf Center 
1805 East Highway 30
Rockwall, Texas 75087
A H Blank Municipal Course 
808 County Line Rd
Des Moines, Iowa 50320-6706
A-Bar-A Ranch Golf Course 
Highway 230
Encampment, Wyoming 82325
A-Ga-Ming Golf Resort, Sundance 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A-Ga-Ming Golf Resort, Torch 
627 Ag A Ming Dr
Kewadin, Michigan 49648-9397
A. C. Read Golf Club, Bayou 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508
A. C. Read Golf Club, Bayview 
Bldg 3495, Nas Pensacola
Pensacola, Florida 32508

All you really need to do here is put your output in a list and then use the CSV library to export it. 您真正需要做的就是将输出放在列表中,然后使用CSV库导出它。 I'm not entirely clear on what you are getting out views-field-nothing-1 but to just focus on view-fields-nothing, you could do something like: 我不完全清楚你是什么意思 - 田野 - 没什么 - 1但只关注视野 - 什么都没有,你可以这样做:

courses_list=[]

for item in g_data2:
   try:
      name=item.contents[1].find_all("div",{"class":"views-field-title"})[0].text
   except:
       name=''
   try:
      address1=item.contents[1].find_all("div",{"class":"views-field-address"})[0].text
   except:
      address1=''
   try:
      address2=item.contents[1].find_all("div",{"class":"views-field-city-state-zip"})[0].text
   except:
      address2=''

   course=[name,address1,address2]
   courses_list.append(course)

This will put the courses in a list, next you can write them to a cvs like so: 这将把课程放在一个列表中,接下来你可以把它们写成如下的cvs:

import csv

with open ('filename.cv','wb') as file:
   writer=csv.writer(file)
   for row in course_list:
      writer.writerow(row)

First of all you want to put all of your items in a list and then write to a file later in case there is an error while you are scrapping. 首先,您希望将所有项目放在一个列表中,然后在以后写入文件,以防您在报废时出现错误。 Instead of printing just append to a list. 而不是打印只是附加到列表。 Then you can write to a csv file 然后你可以写一个csv文件

f= open('filename', 'wb')
csv_writer = csv.writer(f)
for i in main_list:
    csv_writer.writerow(i)
f.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM