简体   繁体   English

如何使用“美丽汤”解析面糊的名称?

[英]How can I use Beautiful Soup to parse the batter's names?

I have had no issues grabbing three stats: hits, runs and rbi's. 我没有任何问题可以掌握三个统计数据:命中率,奔跑和rbi。 Here is the code I have been working with so far: 到目前为止,这是我一直在使用的代码:

#import modules

from bs4 import BeautifulSoup
import requests, os
from selenium import webdriver

#start webdriver 

os.chdir('C:\webdrivers')
header = {'User-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}
options = webdriver.ChromeOptions(); options.add_argument("--start- 
maximized")
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://www.baseball-reference.com/leagues/MLB/2018-standard- 
batting.shtml')

#grab html

soup = BeautifulSoup(driver.page_source, 'html.parser')
driver.quit()

#parse three stats: rbi's, runs and hits

hits = [i.text for i in soup.find_all('td', {'data-stat': 'H'})]
runs = [i.text for i in soup.find_all('td', {'data-stat': 'R'})]
rbi = [i.text for i in soup.find_all('td', {'data-stat': 'RBI'})]

#print data

print(hits, runs, rbi)

The code above works great. 上面的代码效果很好。 When I try to grab the batter's names, however, I run into some problems. 但是,当我尝试获取击球员的名字时,我遇到了一些问题。 The batter's names are not parsed correctly. 击球员的名字没有正确解析。 I would like just their first and last name if possible. 如果可能的话,我只想要他们的名字和姓氏。

Here is what I tried: 这是我尝试过的:

print(soup.find_all('td', {'data-stat': 'player'}))

The batter's names are in the code but there is a lot of extra data. 击球员的名字在代码中,但是有很多额外的数据。 Also, my computer slowed down a lot when I tried this line of code. 另外,当我尝试使用这行代码时,我的计算机的运行速度也大大降低。 Any suggestions? 有什么建议么? Thanks in advance for any help you may offer! 在此先感谢您提供的任何帮助!

How to get batter's names: 如何获得击球手的名字:

just request the player data directly, I found this url when I watch the page load, get player name from this url will very easy: 只是直接请求播放器数据,当我看着页面加载时我发现了这个URL,从这个URL获取播放器名称将非常容易:
https://d3k2oh6evki4b7.cloudfront.net/short/inc/players_search_list.csv https://d3k2oh6evki4b7.cloudfront.net/short/inc/players_search_list.csv

How to speeder your code: 如何加快代码速度:

First: Using selenium to load the webdriver will cost the most part time in your code. 第一:使用selenium加载的webdriver将花费大部分时间在你的代码。 For your grab target, I suggest you use requests directly instead selenium 对于抓取目标,建议您直接使用requests而不是selenium

Second: lxml parser will speeder than the html parser, but you should install it if you never use it, just run " pip install lxml " will help you. 第二: lxml解析器将比html解析器更快,但是如果您从不使用它,则应该安装它,只需运行“ pip install lxml ”将对您有所帮助。
installing-a-parser and summarizes the advantages and disadvantages of each parser library 安装解析器并总结每个解析器库的优缺点

for example: 例如:

import requests
from bs4 import BeautifulSoup

# start requests
target_url = 'https://www.baseball-reference.com/leagues/MLB/2018-standard-batting.shtml'
headers = {'User-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36'}

page_source = requests.get(target_url, headers=headers).text

#grab html

soup = BeautifulSoup(page_source, 'lxml')

#parse three stats: rbi's, runs and hits

hits = [i.text for i in soup.find_all('td', {'data-stat': 'H'})]
runs = [i.text for i in soup.find_all('td', {'data-stat': 'R'})]
rbi = [i.text for i in soup.find_all('td', {'data-stat': 'RBI'})]

#print data

print(hits, runs, rbi)

The data is not in source page, please refer to this link: https://d3k2oh6evki4b7.cloudfront.net/short/inc/players_search_list.csv 数据不在源页面中,请参考此链接: https : //d3k2oh6evki4b7.cloudfront.net/short/inc/players_search_list.csv

this is the csv file you can directly download this file or you can fetch desired data with code as well. 这是csv文件,您可以直接下载此文件,也可以使用代码获取所需的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用漂亮的汤解析这些数据? - how can I parse these data using beautiful soup? 如何使用漂亮的汤来解析此xml文件? - How to use beautiful soup to parse this xml file? 如何使用 Python 的 Beautiful Soup 来查找自定义属性的值? - How can I use Python's Beautiful Soup to find the value of a custom attribute? 我怎样才能用美丽的汤从这个网页上刮掉这个符号? - How can I use beautiful soup to scrape the symbol from this webpage? 如何用美汤定位残疾人? - How Can I Locate Disabled With Beautiful Soup? 如果在最终的 html 中它被 javascript 隐藏,如何用漂亮的汤解析 span? - How to parse span with beautiful soup if in final html it's hidden by javascript? 如何使用漂亮的汤解析表并将其写入新文件 - How to use beautiful soup to parse a table and write it to a new file 如何使用Beautiful Soup查找或选择多个属性名称? - How can I find_all or select multiple attribute names using Beautiful Soup? 如何使用 selenium 或美丽的汤从这个 HTML 解析或刮取 email - How can I parse or scrape the email from this HTML using selenium or beautiful soup 我可以使用 Beautiful Soup 从使用 HTML 显示的 Web 获取 XML 文件吗? - Can I use Beautiful Soup to get an XML file from the web that's displayed using HTML?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM