简体   繁体   中英

Mechanize and Python not handling cookies properly

I have a Python script using mechanize browser which logs into a self hosted Wordpress blog, navigates to a different page after the automatic redirect to the dashboard to automate several builtin functions.

This script actually works 100% on most of my blogs but goes into a permanent loop with one of them.

The difference is that the only one which fails has a plugin called Wassup running. This plugin sets a session cookie for all visitors and this is what I think is causing the issue.

When the script goes to the new page the Wordpress code doesn't get the proper cookie set, decides that the browser isn't logged in and redirects to the login page. The script logs in again and attempts the same function and round we go again.

I tried using Twill which does login correctly and handles the cookies correctly but Twill, by default, outputs everything to the command line. This is not the behaviour I want as I am doing page manipulation at this point and I need access to the raw html.

This is the setup code

# Browser
self.br = mechanize.Browser()

# Cookie Jar 
policy = mechanize.DefaultCookiePolicy(rfc2965=True) 
cj = mechanize.LWPCookieJar(policy=policy) 
self.br.set_cookiejar(cj)

After successful login I call this function

def open(self):
    if 'http://' in str(self.burl):
        site = str(self.burl) + '/wp-admin/plugin-install.php' 
        self.burl = self.burl[7:]
    else:
        site = "http://" + str(self.burl) + '/wp-admin/plugin-install.php' 
    try:
        r = self.br.open(site, timeout=1000)
        html = r.read()
        return html
    except HTTPError, e:
        return str(e.code)

I'm thinking that I will need to save the cookies to a file and then shuffle the order so the Wordpress session cookie gets returned before the Wassup one.

Any other suggestions?

This turned out to be a quite different problem, and fix, than it seemed which is why I have decided to put the answer here for anyone who reads this later.

When a WordPress site is setup there is an option for the url to default to http://sample.com or http://www.sample.com . This turned out to be a problem for the cookie storage. Cookies are stored with the url as part of their name. My program semi-hardcodes the url with one or the other of these formats. This meant that every time I made a new url request it had the wrong format and no cookie with the right name could be found so the WordPress site rightfully decided I wasn't logged in and sent me back to login again.

The fix is to grab the url delivered in the redirect after login and recode the variable (in this case self.burl) to reflect what the .httaccess file expects to see.

This fixed my problem because some of my sites had one format and some the other.

I hope this helps someone out with using requests, twill, mechanise etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM