简体   繁体   中英

beautifulsoup doesn't fully parse the page

    import requests
from bs4 import BeautifulSoup as bs

url1 = 'https://school.karelia.ru/auth/login'
url2 = 'https://school.karelia.ru/personal-area/#diary'

payload = {
    'login_login': 'КлочковМ',
    'login_password': 'КлочковМ7'
}

def getHW():
    with requests.session() as s:
        s.post(url1, data=payload)
        r = s.get(url2)
        soup = bs(r.content, 'html.parser')
        print(soup.find_all("div"))

getHW()

i am trying to parse a site, and this code just doesnt do it fully. in the website's code, there are a lot more subclasses than the result i get from this code:

<div class="right" id="main-region"></div>

for some reason, the class "right" just ends there, even though in the site it continues a lot more. why could this be?

it is because you did soup.find_all("div") . the div ends there with </div> and you told BS to only look for divs, so BS stops there. to actually search for classes see for example this answer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM