简体   繁体   中英

Python - How to read content of web page without using url?

I am trying to make a program in Python to log in to gmail and read the inbox page. This is what I have tried using Selenium and urllib2 (I am new to these):

from requests import session
from selenium import webdriver
import getpass
import urllib2



def gmail_login(username, passw) :
    with session() as c :
        webpage = r'https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2&emr=1&osid=1#identifier'

        driver = webdriver.Chrome('C:\Users\chromedriver_win32\chromedriver.exe')
        driver.get(webpage)

        driver.implicitly_wait(10)

        driver.find_element_by_name('Email').send_keys(username)

        driver.find_element_by_name('signIn').click() # Click 'Next' button after entry of email id.

        driver.find_element_by_id('Passwd').send_keys(passw)

        driver.find_element_by_id('signIn').click() # Click 'Sign In' button after entry of password.

        url = driver.current_url

        readPage(url)

def readPage(url):
    print url

    fName = "gmail_file.html"
    response = urllib2.urlopen(url)
    html = response.read()
    f = open(fName,"w")
    f.write(html)
    f.close()

gmail_login('username', 'password')

I got the login part correct but I'm not able to read the inbox page. In my code I'm basically reopening the inbox page using the url and then reading it and saving it in a html file. But in my html file all I get is the login page! I am guessing that directly opening an inbox page using its url is not allowed and is protected.

So I'm looking for a way to read the content of a web page (any, not only gmail) whose url is not required for the purpose. (The only way I know to read a web page is using urlopen() which requires the url.) Is there any function or library for this purpose ?

You can try out Python imaplib package, to read and manage all your mails using the imap protocol.

You can find a code example here

You could use Charlie Guo's gmail package . Once installed, you can use it like this:

import gmail

g = gmail.login("devansh_sharma@gmail.com", "password123")

emails = g.inbox().mail(unread=True)

for email in emails:
    email.fetch()
    header_from = email.headers['From']
    subject = email.headers['Subject']
    body = email.body
    [... do something cool with your gmail...]

That's going to be much more reliable and simpler than screen scraping.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM