簡體   English   中英

在python中使用美麗的湯刮桌子

[英]Scraping Table using Beautiful soup in python

  • 我如何在for循環中使用find_all訪問<tr> tags inside <tbody><tr> tags inside <tbody> for因為每個<tr>似乎彼此獨立,並且具有替代類'even''odd' 我只能在find_all傳遞兩個參數。 find_all('tr', class_='odd')(even)

  • 另外,我如何分別訪問第1、3、4和6th個。 標簽沒有ID或類。

碼:

[from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').text
soup = BeautifulSoup(src_code, features="html.parser")

i = 1
for trr in soup.find_all('tr', class_='odd'):
    i+=1
    college = trr.td.a.text
    print(college)
    if i%2==0:
        class_='even'
    else:
        class_='odd'][1]

您可以先找到父標簽。

from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').content
soup = BeautifulSoup(src_code, features="html5lib")
trs=soup.find(name = "div",id="related-results").find_all(name = "tr")
trs

trs是您想要的:

[<tr><th>College Name</th><th>Rank</th><th>Overall Score</th><th>Rating</th><th>Ownership</th><th>Intake Exams</th><th></th></tr>,
 <tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-ahmedabad">Indian Institute of Management Ahmedabad</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">1</span></td><td><span class="overall_scoredata">427.92</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.7 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-7057"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&amp;nid=7057&amp;flag=bookmarks&amp;click_location=follow_button&amp;popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="7057"/><span></span> <i>Compare</i> </label></div></td></tr>,
 <tr class="even"><td><a href="https://www.careers360.com/university/indian-institute-of-management-bangalore">Indian Institute of Management Bangalore</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">2</span></td><td><span class="overall_scoredata">408.32</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.1 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">CAT</span><ul><li>GMAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6872"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&amp;nid=6872&amp;flag=bookmarks&amp;click_location=follow_button&amp;popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6872"/><span></span> <i>Compare</i> </label></div></td></tr>,
 <tr class="odd"><td><a href="https://www.careers360.com/university/indian-institute-of-management-calcutta">Indian Institute of Management Calcutta</a><br/></td><td><span class="serialNum circlerate Government"></span><span class="rankStyle">3</span></td><td><span class="overall_scoredata">375.18</span></td><td>AAAAA<div class="rankInfo"> <strong>2017 Rating: </strong> AAAAA</div></td><td><div class="ownership_name">Government</div><div class="rating_review rankInfo"><strong>User Rating: </strong>4.9 / 5</div></td><td><div class="showMoreCheck"> <input type="checkbox"/><div class="ranked_best_branch intakeExam"><div class="intakeExam ng-binding"><span class="best_branch plusMinus">GMAT</span><ul><li>CAT</li></ul></div></div></div></td><td><div class="rank-apply-button btnBlockInfo"><div class="flagging" id="divid-6933"><div class="flag-link flag-default-link"><a class="buttonDefault follow iframe-popup-button" href="/user/register?destination=colleges/ranking/2018&amp;nid=6933&amp;flag=bookmarks&amp;click_location=follow_button&amp;popup=iframe">Follow</a></div></div><div class="client_url"></div></div><div class="college-compare-checkbox combine-rating-block smallclListing"> <label> <input class="tmCheckbox" name="college_ranking" type="checkbox" value="6933"/><span></span> <i>Compare</i> </label></div></td></tr>,
......

find_all("tr",class_=['odd','even'])

這將獲取所有tr標簽,然后獲取帶有標簽的td標簽和標簽文本

from bs4 import BeautifulSoup
import requests
src_code = requests.get('https://bschool.careers360.com/colleges/ranking/2018').text
soup = BeautifulSoup(src_code, features="html.parser")

alltr=soup.find_all("tr",class_=['odd','even'])

for x in alltr:
    print(x.td.a.text)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM