简体   繁体   English

编剧不加载全部 HTML Python

[英]Playwright does not load all of the HTML Python

I'm just trying to scrape the titles from the page, but the html that is being loaded with page.inner_html('body') does not include all of the html. I think it may be loaded from JS, but when I look into the.network tab in dev tools I cannot seem to find a json or where it's being loaded from.我只是想从页面上抓取标题,但是用 page.inner_html('body') 加载的 html 不包括所有 html。我认为它可能是从 JS 加载的,但是当我看进入开发工具中的 .network 选项卡,我似乎找不到 json 或从哪里加载它。 I have tried this with Selenium as well, so there must be something I'm doing fundamentally wrong.我也用 Selenium 试过这个,所以一定有我做的根本错误。

So no items appear from the list, but the regular HTML shows up fine.因此列表中没有出现任何项目,但正常的 HTML 显示正常。 No amount of waiting for the content to load, will load the information.无需等待内容加载,就会加载信息。

#import playwright
from playwright.sync_api import sync_playwright

url = 'https://order.mandarake.co.jp/order/listPage/list?categoryCode=07&keyword=naruto&lang=en'

#open url
with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    #enable javascript
    
    page.goto(url)
    #enable javascript
    

    #load the page and wait for the page to load
    page.wait_for_load_state("networkidle")

    #get the html content
    html = page.inner_html("body")

    print(html)

    #close browser
    browser.close()

No, the webpage isn't loaded content dynamically by JavaScript rather it's entirely static HTML DOM不,网页不是由JavaScript动态加载内容,而是完全是 static HTML DOM

from bs4 import BeautifulSoup
import requests

page = requests.get('https://order.mandarake.co.jp/order/listPage/list?categoryCode=07&keyword=naruto&lang=en')
soup = BeautifulSoup(page.content,'lxml')

data = []
for e in soup.select('div.title'):

    d = {
        'title':e.a.get_text(strip=True),
        
        }
    
    data.append(d)

print(data)

Output: Output:

[{'title': 'NARUTO THE ANIMATION CHRONICLE\u3000genga made for sale'}, {'title': 'Plex DPCF Haruno Sakura Reboru ring of the eyes'}, {'title': 'Naruto: Shippuden\u3000(replica)  ナルト'}, {'title': 'Naruto: Shippuden\u3000(replica)  ナルト'}, {'title': 'Naruto: Shippuden\u3000(replica)  NARUTO -ナルト-'}, {'title': 'Naruto: Shippuden ナルト\u3000(replica)'}, {'title': 'Naruto Shippuuden\u3000(replica) NARUTO -ナルト-'}, {'title': 'NARUTO -ナルト- 疾風伝\u3000(複製セル)'}, {'title': 'MegaHouse    ちみ メガ Petit Chara Land NARUTO SHIPPUDEN ナルト blast-of-wind intermediary   Even [swirl ナルト special is a volume on ばよ.  
  All 6 types set] inner bag not opened/box damaged'}, {'title': 'NARUTO -ナルト- 疾風伝\u3000(複製セル)'}, {'title': 'NARUTO -ナルト- 疾風伝\u3000(複製セル)'}, {'title': 'NARUTO -ナルト- 疾風伝'}, {'title': 'NARUTO -ナルト- 疾風伝'}, {'title': 'NARUTO -ナルト-'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM