[英]How can I split these <td tags from BeautifulSoup on Python?
I am scraping a website and I am having a difficult time understanding.我正在抓取一个网站,但我很难理解。
I am trying to split the tag into two groups so when I run a for loop it should be:我试图将标签分成两组,所以当我运行 for 循环时,它应该是:
# Group 1
<td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&year=2023&date_game=2022-10-19&is_playoff_game=N" data-stat="game_season"><strong>1</strong></td>
<td class="left" data-stat="date_game"><a href="/boxscores/202210190MIA.html">2022-10-19</a></td>
<td class="right" data-stat="age">25-093</td>
<td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td>
<td class="center iz" data-stat="game_location"></td>
<td class="left" data-stat="opp_id"><a href="/teams/CHI/2023.html">CHI</a></td>
<td class="center" csk="-8" data-stat="game_result">L (-8)</td>
<td class="right" data-stat="gs">1</td>
<td class="right" csk="2040" data-stat="mp">34:00</td>
<td class="right" data-stat="fg">5</td>
<td class="right" data-stat="fga">15</td>
<td class="right" data-stat="fg_pct">.333</td>
<td class="right iz" data-stat="fg3">0</td>
<td class="right iz" data-stat="fg3a">0</td>
<td class="right iz" data-stat="fg3_pct"></td>
<td class="right" data-stat="ft">2</td>
<td class="right" data-stat="fta">3</td>
<td class="right" data-stat="ft_pct">.667</td>
<td class="right" data-stat="orb">1</td>
<td class="right" data-stat="drb">8</td>
<td class="right" data-stat="trb">9</td>
<td class="right" data-stat="ast">2</td>
<td class="right iz" data-stat="stl">0</td>
<td class="right" data-stat="blk">1</td>
<td class="right" data-stat="tov">5</td>
<td class="right" data-stat="pf">4</td>
<td class="right" data-stat="pts">12</td>
<td class="right" data-stat="game_score">1.7</td>
<td class="right" data-stat="plus_minus">-15</td>
# Group 2
<td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&year=2023&date_game=2022-10-21&is_playoff_game=N" data-stat="game_season"><strong>2</strong></td>
<td class="left" data-stat="date_game"><a href="/boxscores/202210210MIA.html">2022-10-21</a></td>
<td class="right" data-stat="age">25-095</td>
<td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td>
<td class="center iz" data-stat="game_location"></td>
<td class="left" data-stat="opp_id"><a href="/teams/BOS/2023.html">BOS</a></td>
<td class="center" csk="-7" data-stat="game_result">L (-7)</td>
<td class="right" data-stat="gs">1</td>
<td class="right" csk="2093" data-stat="mp">34:53</td>
<td class="right" data-stat="fg">8</td>
<td class="right" data-stat="fga">11</td>
<td class="right" data-stat="fg_pct">.727</td>
<td class="right iz" data-stat="fg3">0</td>
<td class="right iz" data-stat="fg3a">0</td>
<td class="right iz" data-stat="fg3_pct"></td>
<td class="right" data-stat="ft">3</td>
<td class="right" data-stat="fta">4</td>
<td class="right" data-stat="ft_pct">.750</td>
<td class="right" data-stat="orb">3</td>
<td class="right" data-stat="drb">5</td>
<td class="right" data-stat="trb">8</td>
<td class="right" data-stat="ast">5</td>
<td class="right" data-stat="stl">2</td>
<td class="right iz" data-stat="blk">0</td>
<td class="right" data-stat="tov">5</td>
<td class="right" data-stat="pf">4</td>
<td class="right" data-stat="pts">19</td>
<td class="right" data-stat="game_score">16.6</td>
<td class="right" data-stat="plus_minus">+20</td>
I will then put these two groups into a 2D list.然后我将把这两个组放入一个二维列表中。
I hope that makes sense.我希望这是有道理的。 Any help or feedback will be greatly appreciated!
任何帮助或反馈将不胜感激!
I tried:我试过:
stats = player_header.find_all('td')
for stat in stats:
print (stat.text)
But I cannot group or break these tags into separate groups.但我无法将这些标签分组或分成不同的组。
This approach assumes that the only HTML is what you have shared ( html_doc
is you above HTML).此方法假定唯一的 HTML 是您共享的内容(
html_doc
是您在 HTML 之上)。
Here's the approach, insert_before()
a new wrapper class
before each group, the group is based on the class right endpoint tooltip
, and then create two lists accordingly.这是方法,
insert_before()
在每个组之前一个新的包装器class
,该组基于 class right endpoint tooltip
,然后相应地创建两个列表。
soup = BeautifulSoup(html_doc, "html.parser")
out1 = []
out2 = []
for tag in soup.find_all("td", class_="right endpoint tooltip"):
tag.insert_before(soup.new_tag("div", **{"class": "wrapper"}))
for tag in soup.find_all(class_="right endpoint tooltip")[0].find_all_next():
if "wrapper" in tag.attrs.get("class", []):
break
out1.append(tag.text)
for tag in soup.find_all(class_="right endpoint tooltip")[1].find_all_next():
out2.append(tag.text)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.