簡體   English   中英

使用漂亮的湯從HTML抓取多個數據標簽

[英]Scraping multiple data tags from HTML using beautiful Soup

我正在嘗試抓取HTML以創建一個包含投手姓名和他的慣用手的字典。 數據標簽被掩埋了-到目前為止,我只能從數據集中收集投手的名字。 HTML輸出(針對每個播放器)如下:

<div class="pitcher players">
<input name="import-data" type="hidden" value="%5B%7B%22slate_id%22%3A20190%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210893103%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%2C%7B%22slate_id%22%3A20192%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210894893%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%2C%7B%22slate_id%22%3A20193%2C%22type%22%3A%22classic%22%2C%22player_id%22%3A%2210895115%22%2C%22salary%22%3A%2211800%22%2C%22position%22%3A%22SP%22%2C%22fpts%22%3A14.96%7D%5D"/>
<a class="player-popup" data-url="https://rotogrinders.com/players/johnny-cueto-11193?site=draftkings" href="https://rotogrinders.com/players/johnny-cueto-11193">Johnny Cueto</a>
<span class="meta stats">
<span class="stats">
            R
        </span>
<span class="salary" data-role="salary" data-salary="$11.8K">
            $11.8K
        </span>
<span class="fpts" data-fpts="14.96" data-product="56" data-role="authorize" title="Projected Points">14.96</span>

我已經修補了,然后變得空了-我敢肯定,我想得太多了。 這是我到目前為止的代碼:

import requests
from bs4 import BeautifulSoup

url = "https://rotogrinders.com/lineups/mlb?site=draftkings"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")

players_confirmed = {}
results = [soup.find_all("div", {'class':'pitcher players'}]

遍歷結果集以獲得所需的更詳細的數據標簽信息的最佳方法是什么?

我需要HTML中以開頭的文本,以及標簽中的handed-ness。最好,我將擁有一個包含以下內容的字典:

{Johnny Cueto:R,玩家2:L,...}

import requests
from bs4 import BeautifulSoup
url = "https://rotogrinders.com/lineups/mlb?site=draftkings"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data, "html.parser")
players_confirmed = {}
results = soup.find_all("div", {'class': 'pitcher players'})
dicti={}
for j in results:
    dicti[j.a.text]=j.select(".stats")[1].text.strip("\n").strip()  

只需使用found元素的select或find函數,您就可以迭代

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM