简体   繁体   English

用beautifulsoup解析小的html代码的pythonic方法?

[英]The pythonic way to parse a small html code with beautifulsoup?

What's the best pythonic way to parse the html-code below using BeautifulSoup? 使用BeautifulSoup解析以下html代码的最佳pythonic方法是什么?

 <html> <body> <div class="bet_group"> <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total <!-- --> </div> <div class="bets betCols2"> <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div> <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div> <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div> <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div> <div class="bets__empty-cell"> </div> <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div> </div> </div> </body> </html> 

I'm trying to get the output: 我正在尝试获取输出:

Title: Total

Total Over 4.5: 3.88, Total Under 4.5: 1.34

Total Over 5.5: 12.5, Total Under 4.5: 1.02

I've tried with the following code but it doesn't quite get there. 我已经尝试使用以下代码,但并没有完全到达那里。

soup = BeautifulSoup(html, 'lxml')

infos = soup.find_all('span', class_='bet_type')
for info in infos:
    info.get_text()
odds = soup.find_all('span', class_='koeff')
for odd in odds:
    odd.get_text()

Try: 尝试:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
output = ""
for i in soup.find("div", class_="bet_group").text.splitlines():
    if i.strip():
        output += i.strip()+"\n"
print(output)

Output: 输出:

Total
Total Over 4.5 3.38
Total Under 4.5 1.34
Total Over 5.5 12.5
Total Under 5 1.04
Total Under 5.5 1.02

May be this help you, 可能对您有帮助

    st = """
        <html>

<body>
  <div class="bet_group">
    <div class="bet-title bet-title_justify"><span class="bet-title__star"></span> Total
      <!-- -->
    </div>
    <div class="bets betCols2">
      <div class=""><span class="bet_type" data-type="9">Total Over 4.5</span> <span class="koeff" data-coef="3.38"><i>3.38</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 4.5</span> <span class="koeff" data-coef="1.34"><i>1.34</i></span></div>
      <div class=""><span class="bet_type" data-type="9">Total Over 5.5</span> <span class="koeff" data-coef="12.5"><i>12.5</i></span></div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5</span> <span class="koeff" data-coef="1.04"><i>1.04</i></span></div>
      <div class="bets__empty-cell"> </div>
      <div class=""><span class="bet_type" data-type="10">Total Under 5.5</span> <span class="koeff" data-coef="1.02"><i>1.02</i></span></div>
    </div>
  </div>
</body>

</html>
    """
    soup = BeautifulSoup(st, 'lxml')
    title = soup.find('div', attrs={'class': 'bet-title'}).get_text().strip()
    print(title)
    for spn in soup.find_all('span', attrs={'class': 'bet_type'}):
        bet_text = spn.get_text()
        print(bet_text)


    # Output as: Total
    #            Total Over 4.5
    #            Total Under 4.5
    #            Total Over 5.5
    #            Total Under 5
    #            Total Under 5.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM