[英]Where am I going wrong with this scraping?
It should be really simple, but I'm struggling to pull out each row from this NCAA table (eg Florida State, ACC, 22-1-2') etc.它应该非常简单,但我正在努力从这张 NCAA 表中提取每一行(例如 Florida State、ACC、22-1-2')等。
I guess my main question here is, where do I start?我想我的主要问题是,我从哪里开始? What am I looking for?
我在找什么? Do I search for the 'div' tag, or the 'tbody' tag or the 'tr' tag - either one i try with find_all or find or even select using the CSS selector, returns nothing.
我是否搜索'div'标签,或'tbody'标签或'tr'标签-我尝试使用find_all或find甚至select使用CSS选择器,什么都不返回。
https://www.ncaa.com/rankings/soccer-women/d1/ncaa-womens-soccer-rpi https://www.ncaa.com/rankings/soccer-women/d1/ncaa-womens-soccer-rpi
Edit: Managed to get it, see below:编辑:设法得到它,见下文:
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://www.ncaa.com/rankings/soccer-women/d1/ncaa-womens-soccer-rpi'
result = requests.get(url)
soup = BeautifulSoup(result.text,'html.parser')
check = soup.find_all('tr')
names_lst = []
conference_lst = []
record_lst = []
for info in check[1:]:
details = info.find_all('td')
names = details[1].text.strip()
conference = details[2].text.strip()
record = details[3].text.strip()
names_lst.append(names)
conference_lst.append(conference)
record_lst.append(record)
print(names_lst)
print(conference_lst)
print(record_lst)
with open ('ncaa_rankings.csv', 'w') as ncaa_file:
csv_writer = csv.writer(ncaa_file)
for names, conference, record in zip(names_lst, conference_lst, record_lst):
csv_writer.writerow([names, conference, record])
This problem is solvable with 5 lines of code:这个问题可以用 5 行代码解决:
import pandas as pd
url = "https://www.ncaa.com/rankings/soccer-women/d1/ncaa-womens-soccer-rpi"
df = pd.read_html(url)[0]
df.to_csv("w_soccer_rpi.csv")
print(df)
Result (also saved in a csv file):结果(也保存在 csv 文件中):
Rank School Conference Record Road Neutral Home Non Div I
0 1 Florida St. ACC 22-1-2 6-1-1 4-0-0 12-0-1 0-0-0
1 2 Duke ACC 16-4-1 4-1-1 0-0-0 12-3-0 0-0-0
2 3 Arkansas SEC 19-4-1 4-3-1 4-1-0 11-0-0 0-0-0
3 4 Rutgers Big Ten 19-4-2 6-1-0 0-1-0 13-2-2 0-0-0
4 5 Michigan Big Ten 18-4-3 5-3-2 1-0-0 12-1-1 0-0-0
... ... ... ... ... ... ... ... ...
337 338 Nicholls Southland 0-18-0 0-10-0 0-2-0 0-6-0 0-0-0
338 339 Delaware St. DI Independent 2-11-1 1-6-0 0-0-0 1-5-1 1-0-0
339 340 Mississippi Val. SWAC 0-13-0 0-7-0 0-1-0 0-5-0 0-0-0
340 341 Hampton Big South 1-13-1 0-8-0 0-0-0 1-5-1 0-0-0
341 342 South Carolina St. DI Independent 0-10-1 0-4-1 0-0-0 0-6-0 2-1-0
Relevant pandas documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html相关 pandas 文档: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.ZFC35FDC70D5FC69D2693EZZ5A
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.