[英]Write Headers Once in Python CSV Writer Loop
下面是一個刮板,它在兩個網站之間循環,刮擦團隊的花名冊信息,將信息放入數組中,然后將數組導出到CSV文件中。 一切正常,但唯一的問題是每次刮板移動到第二個網站時,csv文件中都會重復寫入行標題。 是否可以調整代碼的CSV部分,以使標頭僅在刮板遍歷多個網站時才出現一次? 提前致謝!
import requests
import csv
from bs4 import BeautifulSoup
team_list={'yankees','redsox'}
for team in team_list:
page = requests.get('http://m.{}.mlb.com/roster/'.format(team))
soup = BeautifulSoup(page.text, 'html.parser')
soup.find(class_='nav-tabset-container').decompose()
soup.find(class_='column secondary span-5 right').decompose()
roster = soup.find(class_='layout layout-roster')
names = [n.contents[0] for n in roster.find_all('a')]
ids = [n['href'].split('/')[2] for n in roster.find_all('a')]
number = [n.contents[0] for n in roster.find_all('td', index='0')]
handedness = [n.contents[0] for n in roster.find_all('td', index='3')]
height = [n.contents[0] for n in roster.find_all('td', index='4')]
weight = [n.contents[0] for n in roster.find_all('td', index='5')]
DOB = [n.contents[0] for n in roster.find_all('td', index='6')]
team = [soup.find('meta',property='og:site_name')['content']] * len(names)
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
f = csv.writer(fp)
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
使用變量檢查是否添加了標頭可能會有所幫助。 如果添加標題,則不會第二次添加
header_added = False
for team in team_list:
do_some stuff
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
f = csv.writer(fp)
if not header_added:
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
header_added = True
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
另一種方法是簡單地在for循環之前執行此操作,因此您不必檢查是否已編寫。
import requests
import csv
from bs4 import BeautifulSoup
team_list={'yankees','redsox'}
with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
f = csv.writer(fp)
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
for team in team_list:
do_your_bs4_and_parsing_stuff
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
f = csv.writer(fp)
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
您也可以一次打開文檔,而不是三遍
import requests
import csv
from bs4 import BeautifulSoup
team_list={'yankees','redsox'}
with open('MLB_Active_Roster.csv', 'w', newline='') as fp:
f = csv.writer(fp)
f.writerow(['Name','ID','Number','Hand','Height','Weight','DOB','Team'])
for team in team_list:
do_your_bs4_and_parsing_stuff
f.writerows(zip(names, ids, number, handedness, height, weight, DOB, team))
只需在循環之前編寫標頭,然后在with
上下文管理器中進行循環:
import requests
import csv
from bs4 import BeautifulSoup
team_list = {'yankees', 'redsox'}
headers = ['Name', 'ID', 'Number', 'Hand', 'Height', 'Weight', 'DOB', 'Team']
# 1. wrap everything in context manager
with open('MLB_Active_Roster.csv', 'a', newline='') as fp:
f = csv.writer(fp)
# 2. write headers before anything else
f.writerow(headers)
# 3. now process the loop
for team in team_list:
# Do everything else...
您還可以在循環外team_list
定義標頭,從而使代碼更team_list
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.