简体   繁体   中英

How to get html in python inside #document tag?

<html>
<head>...</head>
<body>
    <iframe id="hiddenFrame" name="hiddenFrame">
        #document
            <html>
                <head>...</head>
                <body>...</body>
            </html>
    </iframe>
</html>

This is structure of website that I want to crawl. I was try to get html inside #document tag, (tried with urllib.request and requests) but I can't get html inside #document..

request result:

<html>
    <head>...</head>
    <body>
        <iframe></iframe>
    </body>
</html>

There is nothing in iframe tag. How can I get html inside #document tag?

I usually use selenium to handle these situations. Basically you have to get in the iframe to get the content.

See this question.

Is the iframe didn't have src attribute?

Why not do this:

Firstly, get the page using requests, then get the src attribute in iframe using beautifulsoup4.

After you get the iframe src attribute, do requests for it.

Voila, you will get the page inside the iframe

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM