简体   繁体   English

我怎样才能分开这些<div id="text_translate"><p>我正在抓取一个网站,但我很难理解。</p><p> 我试图将标签分成两组,所以当我运行 for 循环时,它应该是:</p><pre> # Group 1 <td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&amp;year=2023&amp;date_game=2022-10-19&amp;is_playoff_game=N" data-stat="game_season"><strong>1</strong></td> <td class="left" data-stat="date_game"><a href="/boxscores/202210190MIA.html">2022-10-19</a></td> <td class="right" data-stat="age">25-093</td> <td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td> <td class="center iz" data-stat="game_location"></td> <td class="left" data-stat="opp_id"><a href="/teams/CHI/2023.html">CHI</a></td> <td class="center" csk="-8" data-stat="game_result">L (-8)</td> <td class="right" data-stat="gs">1</td> <td class="right" csk="2040" data-stat="mp">34:00</td> <td class="right" data-stat="fg">5</td> <td class="right" data-stat="fga">15</td> <td class="right" data-stat="fg_pct">.333</td> <td class="right iz" data-stat="fg3">0</td> <td class="right iz" data-stat="fg3a">0</td> <td class="right iz" data-stat="fg3_pct"></td> <td class="right" data-stat="ft">2</td> <td class="right" data-stat="fta">3</td> <td class="right" data-stat="ft_pct">.667</td> <td class="right" data-stat="orb">1</td> <td class="right" data-stat="drb">8</td> <td class="right" data-stat="trb">9</td> <td class="right" data-stat="ast">2</td> <td class="right iz" data-stat="stl">0</td> <td class="right" data-stat="blk">1</td> <td class="right" data-stat="tov">5</td> <td class="right" data-stat="pf">4</td> <td class="right" data-stat="pts">12</td> <td class="right" data-stat="game_score">1.7</td> <td class="right" data-stat="plus_minus">-15</td> # Group 2 <td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&amp;year=2023&amp;date_game=2022-10-21&amp;is_playoff_game=N" data-stat="game_season"><strong>2</strong></td> <td class="left" data-stat="date_game"><a href="/boxscores/202210210MIA.html">2022-10-21</a></td> <td class="right" data-stat="age">25-095</td> <td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td> <td class="center iz" data-stat="game_location"></td> <td class="left" data-stat="opp_id"><a href="/teams/BOS/2023.html">BOS</a></td> <td class="center" csk="-7" data-stat="game_result">L (-7)</td> <td class="right" data-stat="gs">1</td> <td class="right" csk="2093" data-stat="mp">34:53</td> <td class="right" data-stat="fg">8</td> <td class="right" data-stat="fga">11</td> <td class="right" data-stat="fg_pct">.727</td> <td class="right iz" data-stat="fg3">0</td> <td class="right iz" data-stat="fg3a">0</td> <td class="right iz" data-stat="fg3_pct"></td> <td class="right" data-stat="ft">3</td> <td class="right" data-stat="fta">4</td> <td class="right" data-stat="ft_pct">.750</td> <td class="right" data-stat="orb">3</td> <td class="right" data-stat="drb">5</td> <td class="right" data-stat="trb">8</td> <td class="right" data-stat="ast">5</td> <td class="right" data-stat="stl">2</td> <td class="right iz" data-stat="blk">0</td> <td class="right" data-stat="tov">5</td> <td class="right" data-stat="pf">4</td> <td class="right" data-stat="pts">19</td> <td class="right" data-stat="game_score">16.6</td> <td class="right" data-stat="plus_minus">+20</td></pre><p> 然后我将把这两个组放入一个二维列表中。</p><p> 我希望这是有道理的。 任何帮助或反馈将不胜感激!</p><p> 我试过:</p><pre> stats = player_header.find_all('td') for stat in stats: print (stat.text)</pre><p> 但我无法将这些标签分组或分成不同的组。</p></div>

[英]How can I split these <td tags from BeautifulSoup on Python?

I am scraping a website and I am having a difficult time understanding.我正在抓取一个网站,但我很难理解。

I am trying to split the tag into two groups so when I run a for loop it should be:我试图将标签分成两组,所以当我运行 for 循环时,它应该是:

# Group 1
<td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&amp;year=2023&amp;date_game=2022-10-19&amp;is_playoff_game=N" data-stat="game_season"><strong>1</strong></td>
<td class="left" data-stat="date_game"><a href="/boxscores/202210190MIA.html">2022-10-19</a></td>
<td class="right" data-stat="age">25-093</td>
<td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td>
<td class="center iz" data-stat="game_location"></td>
<td class="left" data-stat="opp_id"><a href="/teams/CHI/2023.html">CHI</a></td>
<td class="center" csk="-8" data-stat="game_result">L (-8)</td>
<td class="right" data-stat="gs">1</td>
<td class="right" csk="2040" data-stat="mp">34:00</td>
<td class="right" data-stat="fg">5</td>
<td class="right" data-stat="fga">15</td>
<td class="right" data-stat="fg_pct">.333</td>
<td class="right iz" data-stat="fg3">0</td>
<td class="right iz" data-stat="fg3a">0</td>
<td class="right iz" data-stat="fg3_pct"></td>
<td class="right" data-stat="ft">2</td>
<td class="right" data-stat="fta">3</td>
<td class="right" data-stat="ft_pct">.667</td>
<td class="right" data-stat="orb">1</td>
<td class="right" data-stat="drb">8</td>
<td class="right" data-stat="trb">9</td>
<td class="right" data-stat="ast">2</td>
<td class="right iz" data-stat="stl">0</td>
<td class="right" data-stat="blk">1</td>
<td class="right" data-stat="tov">5</td>
<td class="right" data-stat="pf">4</td>
<td class="right" data-stat="pts">12</td>
<td class="right" data-stat="game_score">1.7</td>
<td class="right" data-stat="plus_minus">-15</td>

# Group 2
<td class="right endpoint tooltip" data-endpoint="/players/pgl_cum_stats.cgi?player=adebaba01&amp;year=2023&amp;date_game=2022-10-21&amp;is_playoff_game=N" data-stat="game_season"><strong>2</strong></td>
<td class="left" data-stat="date_game"><a href="/boxscores/202210210MIA.html">2022-10-21</a></td>
<td class="right" data-stat="age">25-095</td>
<td class="left" data-stat="team_id"><a href="/teams/MIA/2023.html">MIA</a></td>
<td class="center iz" data-stat="game_location"></td>
<td class="left" data-stat="opp_id"><a href="/teams/BOS/2023.html">BOS</a></td>
<td class="center" csk="-7" data-stat="game_result">L (-7)</td>
<td class="right" data-stat="gs">1</td>
<td class="right" csk="2093" data-stat="mp">34:53</td>
<td class="right" data-stat="fg">8</td>
<td class="right" data-stat="fga">11</td>
<td class="right" data-stat="fg_pct">.727</td>
<td class="right iz" data-stat="fg3">0</td>
<td class="right iz" data-stat="fg3a">0</td>
<td class="right iz" data-stat="fg3_pct"></td>
<td class="right" data-stat="ft">3</td>
<td class="right" data-stat="fta">4</td>
<td class="right" data-stat="ft_pct">.750</td>
<td class="right" data-stat="orb">3</td>
<td class="right" data-stat="drb">5</td>
<td class="right" data-stat="trb">8</td>
<td class="right" data-stat="ast">5</td>
<td class="right" data-stat="stl">2</td>
<td class="right iz" data-stat="blk">0</td>
<td class="right" data-stat="tov">5</td>
<td class="right" data-stat="pf">4</td>
<td class="right" data-stat="pts">19</td>
<td class="right" data-stat="game_score">16.6</td>
<td class="right" data-stat="plus_minus">+20</td>

I will then put these two groups into a 2D list.然后我将把这两个组放入一个二维列表中。

I hope that makes sense.我希望这是有道理的。 Any help or feedback will be greatly appreciated!任何帮助或反馈将不胜感激!

I tried:我试过:

stats = player_header.find_all('td')
for stat in stats:
    print (stat.text)

But I cannot group or break these tags into separate groups.但我无法将这些标签分组或分成不同的组。

This approach assumes that the only HTML is what you have shared ( html_doc is you above HTML).此方法假定唯一的 HTML 是您共享的内容( html_doc是您在 HTML 之上)。

Here's the approach, insert_before() a new wrapper class before each group, the group is based on the class right endpoint tooltip , and then create two lists accordingly.这是方法, insert_before()在每个组之前一个新的包装器class ,该组基于 class right endpoint tooltip ,然后相应地创建两个列表。


soup = BeautifulSoup(html_doc, "html.parser")

out1 = []
out2 = []
for tag in soup.find_all("td", class_="right endpoint tooltip"):
    tag.insert_before(soup.new_tag("div", **{"class": "wrapper"}))


for tag in soup.find_all(class_="right endpoint tooltip")[0].find_all_next():

    if "wrapper" in tag.attrs.get("class", []):
        break
    out1.append(tag.text)

for tag in soup.find_all(class_="right endpoint tooltip")[1].find_all_next():
    out2.append(tag.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM