简体   繁体   中英

BeautifulSoup can't find video or certain div tags

I don't know why it doesn't let me access the video tag.

I'm trying to scrape the video source but it doesn't let me access the 'video' tag at all.

 <video class="jw-video jw-reset" disableremoteplayback="" webkit- playsinline="" playsinline="" jw-loaded="data" src="randomsrc2" jw-played="" style="object-fit: fill;"></video> 

    #web scraping stuff
    #web scraping stuff
    import bs4 as bs
    import urllib.request

    url = 'https://gostream.is/film/cars-3-21095/watching.html?ep=682669'
    user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; 
    rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

    headers={'User-Agent':user_agent,}

    q = urllib.request.Request(url, headers=headers)
    sauce = urllib.request.urlopen(q).read()
    soup = bs.BeautifulSoup(sauce,'lxml')
    print(soup)

    f=open('testd2.txt','w+')
    kuk = str(soup)
    f.write(kuk) #When I search for 'video' in the file it doesn't give me anything
    video = soup.find('video')
    print(video) #gives None

In firefox go to about:config and search for javascript.enabled as false. Open your link. If you don't see your Video link in browser then it means the tag is being inserted at run-time using JavaScript. And request won't be able to do that.

For that you would need to have a browser and selenium. In that case you will chance your code as below

from selenium import webdriver
driver = webdriver.Firefox()
url = 'https://gostream.is/film/cars-3-21095/watching.html?ep=682669'
driver.get(url)
sauce = driver.page_source
soup = bs.BeautifulSoup(sauce,'lxml')

You can even remove soup all together and use something like below

for elem in driver.find_elements_by_tag_name("video"):
    print(elem.get_attribute("src"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM