简体   繁体   English

网站抓取后仅写一行

[英]writing only one row after website scraping

I am trying to extract a list of all the golf courses in the USA through this link . 我试图通过此链接提取美国所有高尔夫球场的清单。 I need to extract the name of the golf course, address, and the phone number. 我需要提取高尔夫球场的名称,地址和电话号码。 My script is suppose to extract all the data from the website but it looks like it only prints one row in my csv file. 我的脚本假设要从网站中提取所有数据,但看起来它只在csv文件中打印一行。 I noticed that when I print the "name" field it only prints once despite the find_all function. 我注意到当我打印“名称”字段时,尽管使用了find_all函数,但它仅打印一次。 All I need is the data and not just one field from multiple links on the website. 我需要的只是数据,而不仅仅是网站上多个链接中的一个字段。

How do I go about fixing my script so that it prints all the needed data into a CSV file. 我该如何修复脚本,以便将所有需要的数据打印到CSV文件中。

Here is my script: 这是我的脚本:

import csv
import requests 
from bs4 import BeautifulSoup

courses_list = []

for i in range(1):
 url="http://www.thegolfcourses.net/page/1?ls&location=California&orderby=title&radius=6750#038;location=California&orderby=title&radius=6750" #.format(i)
 r = requests.get(url)
 soup = BeautifulSoup(r.content)


g_data2=soup.find_all("div",{"class":"list"})

for item in g_data2:
  try:
    name= item.contents[7].find_all("a",{"class":"entry-title"})[0].text
    print name
  except:
        name=''
  try:
    phone= item.contents[7].find_all("p",{"class":"listing-phone"})[0].text
  except:
      phone=''
  try:
    address= item.contents[7].find_all("p",{"class":"listing-address"})[0].text
  except:
      address=''

  course=[name,phone,address]
  courses_list.append(course)


with open ('PGN_Final.csv','a') as file:
  writer=csv.writer(file)
  for row in courses_list:
          writer.writerow([s.encode("utf-8") for s in row])

Here is a neat implementation for your code. 这是代码的简洁实现。 You can use the library urllib2 instead of requests . 您可以使用库urllib2代替requests And bs4 works the same though. bs4工作原理相同。

import csv
import urllib2
from BeautifulSoup import *

url="http://www.thegolfcourses.net/page/1?ls&location=California&orderby=title&radius=6750#038;location=California&orderby=title&radius=6750" #.format(i)
r = urllib2.urlopen(url).read()
soup = BeautifulSoup(r)

courses_list = []
courses_list.append(("Course name","Phone Number","Address"))

names = soup.findAll('h2', attrs={'class':'entry-title'})
phones = soup.findAll('p', attrs={'class':'listing-phone'})
address = soup.findAll('p', attrs={'class':'listing-address'})
for na, ph, add in zip(names,phones, address):
    courses_list.append((na.text,ph.text,add.text))


with open ('PGN_Final.csv','a') as file:
    writer=csv.writer(file)
    for row in courses_list:
        writer.writerow([s.encode("utf-8") for s in row])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM