如何從ubuntu服務器以編程方式登錄到Yahoo

Question

我想從ubuntu服務器上運行的腳本登錄我的yahoo帳戶。 我曾試圖將python與機械化一起使用，但我的計划存在缺陷。

這是我目前的代碼。

        loginurl = "https://login.yahoo.com/config/login"
        br = mechanize.Browser()
        cj = cookielib.LWPCookieJar()
        br.set_cookiejar(cj)
        br.set_handle_equiv(True)
        br.set_handle_gzip(True)
        br.set_handle_redirect(True)
        br.set_handle_referer(True)
        br.set_handle_robots(False)
        br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
        br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
        r = br.open(loginurl)
        html = r.read()
        br.select_form(nr=0)
        br.form['login']='[mylogin]'
        br.form['passwd']='[mypassword]'
        br.submit()

        print br.response().read()

我得到的回復是雅虎登錄頁面，帶有醒目的紅色文本閱讀。 “必須在你的broswer上啟用Javascript”或類似的東西。 機械化文檔中有一節提到用JS創建cookie的頁面，但是幫助頁面返回HTTP 400（只是我的運氣）

弄清楚javascript的功能，然后手動執行它聽起來像是一項非常困難的任務。 我願意切換到任何工具/語言，只要它可以在ubuntu服務器上運行。 即使這意味着使用不同的工具進行登錄，然后將登錄cookie傳遞回我的python腳本。 任何幫助/建議表示贊賞。

更新：

我不想使用Yahoo API
我也嘗試過scrapy，但我認為同樣的問題也會發生

我的scrapy腳本

class YahooSpider(BaseSpider):
name = "yahoo"
start_urls = [
    "https://login.yahoo.com/config/login?.intl=us&.lang=en-US&.partner=&.last=&.src=&.pd=_ver%3D0%26c%3D%26ivt%3D%26sg%3D&pkg=&stepid=&.done=http%3a//my.yahoo.com"
]

def parse(self, response):
    x = HtmlXPathSelector(response)
    print x.select("//input/@value").extract()
    return [FormRequest.from_response(response,
                formdata={'login': '[my username]', 'passwd': '[mypassword]'},
                callback=self.after_login)]

def after_login(self, response):
    # check login succeed before going on
    if response.url == 'http://my.yahoo.com':
        return Request("[where i want to go next]",
                  callback=self.next_page, errback=self.error, dont_filter=True)
    else:
        print response.url
        self.log("Login failed.", level=log.CRITICAL)

def next_page(sekf, response):
    x = HtmlXPathSelector(response)
    print x.select("//title/text()").extract()

scrapy腳本只輸出“https://login.yahoo.com/config/login”...... boo

Answer 1

我很驚訝這是有效的：

Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup as BS
>>> import requests
>>> r = requests.get('https://login.yahoo.com/')
>>> soup = BS(r.text)
>>> login_form = soup.find('form', attrs={'name':'login_form'})
>>> hiddens = login_form.findAll('input', attrs={'type':'hidden'})
>>> payload = {}
>>> for h in hiddens:
...     payload[str(h.get('name'))] = str(h.get('value'))
...
>>> payload['login'] = 'testtest481@yahoo.com'
>>> payload['passwd'] = '********'
>>> post_url = str(login_form.get('action'))
>>> r2 = requests.post(post_url, cookies=r.cookies, data=payload)
>>> r3 = requests.get('http://my.yahoo.com', cookies=r2.cookies)
>>> page = r3.text
>>> pos = page.find('testtest481')
>>> print page[ pos - 50 : pos + 300 ]
   You are signed in as: <span class="yuhead-yid">testtest481</span>        </li>    </ul></li><li id="yuhead-me-signout" class="yuhead-me"><a href="
http://login.yahoo.com/config/login?logout=1&.direct=2&.done=http://www.yahoo.com&amp;.src=my&amp;.intl=us&amp;.lang=en-US" target="_top" rel="nofoll
ow">            Sign Out        </a><img width='0' h
>>>

請嘗試一下：

"""                                                                        
ylogin.py - how-to-login-to-yahoo-programatically-from-an-ubuntu-server    

http://stackoverflow.com/questions/11974478/                               
Test my.yahoo.com login using requests and BeautifulSoup.                  
"""                                                                        

from BeautifulSoup import BeautifulSoup as BS                              
import requests                                                            

CREDS = {'login': 'CHANGE ME',                                             
         'passwd': 'CHANGE ME'}                                            
URLS = {'login': 'https://login.yahoo.com/',                               
        'post': 'https://login.yahoo.com/config/login?',                   
        'home': 'http://my.yahoo.com/'}                                    

def test():                                                                
    cookies = get_logged_in_cookies()                                      
    req_with_logged_in_cookies = requests.get(URLS['home'], cookies=cookies)    
    assert 'You are signed in' in req_with_logged_in_cookies.text
    print "If you can see this message you must be logged in." 

def get_logged_in_cookies():                                               
    req = requests.get(URLS['login'])                                      
    hidden_inputs = BS(req.text).find('form', attrs={'name':'login_form'})\
                                .findAll('input', attrs={'type':'hidden'}) 
    data = dict(CREDS.items() + dict( (h.get('name'), h.get('value')) \    
                                         for h in hidden_inputs).items() ) 
    post_req = requests.post(URLS['post'], cookies=req.cookies, data=data) 
    return post_req.cookies                                                

test()

根據需要添加錯誤處理。

Answer 2

如果頁面使用的是javascript，您可以考慮使用ghost.py之類的內容而不是請求或機械化。 ghost.py托管一個WebKit客戶端，應該能夠以最小的努力處理這些棘手的情況。

Answer 3

你的Scrapy腳本適合我：

from scrapy.spider import BaseSpider
from scrapy.http import FormRequest
from scrapy.selector import HtmlXPathSelector

class YahooSpider(BaseSpider):
    name = "yahoo"
    start_urls = [
        "https://login.yahoo.com/config/login?.intl=us&.lang=en-US&.partner=&.last=&.src=&.pd=_ver%3D0%26c%3D%26ivt%3D%26sg%3D&pkg=&stepid=&.done=http%3a//my.yahoo.com"
    ]

    def parse(self, response):
        x = HtmlXPathSelector(response)
        print x.select("//input/@value").extract()
        return [FormRequest.from_response(response,
                    formdata={'login': '<username>', 'passwd': '<password>'},
                    callback=self.after_login)]

    def after_login(self, response):
        self.log('Login successful: %s' % response.url)

輸出：

stav@maia:myproj$ scrapy crawl yahoo
2012-08-22 20:55:31-0500 [scrapy] INFO: Scrapy 0.15.1 started (bot: drzyahoo)
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Enabled item pipelines:
2012-08-22 20:55:31-0500 [yahoo] INFO: Spider opened
2012-08-22 20:55:31-0500 [yahoo] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-08-22 20:55:31-0500 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-08-22 20:55:32-0500 [yahoo] DEBUG: Crawled (200) <GET https://login.yahoo.com/config/login?.intl=us&.lang=en-US&.partner=&.last=&.src=&.pd=_ver%3D0%26c%3D%26ivt%3D%26sg%3D&pkg=&stepid=&.done=http%3a//my.yahoo.com> (referer: None)
[u'1', u'', u'', u'', u'', u'', u'', u'us', u'en-US', u'', u'', u'93s42g583b3cg', u'0', u'L0iOlEQ1EbZ24TfLRpA43s5offgQ', u'', u'', u'', u'', u'', u'0', u'Y', u'http://my.yahoo.com', u'_ver=0&c=&ivt=&sg=', u'0', u'0', u'0', u'5', u'5', u'', u'y']
2012-08-22 20:55:32-0500 [yahoo] DEBUG: Redirecting (meta refresh) to <GET http://my.yahoo.com> from <POST https://login.yahoo.com/config/login>
2012-08-22 20:55:33-0500 [yahoo] DEBUG: Crawled (200) <GET http://my.yahoo.com> (referer: https://login.yahoo.com/config/login?.intl=us&.lang=en-US&.partner=&.last=&.src=&.pd=_ver%3D0%26c%3D%26ivt%3D%26sg%3D&pkg=&stepid=&.done=http%3a//my.yahoo.com)
2012-08-22 20:55:33-0500 [yahoo] DEBUG: Login successful: http://my.yahoo.com
2012-08-22 20:55:33-0500 [yahoo] INFO: Closing spider (finished)
2012-08-22 20:55:33-0500 [yahoo] INFO: Dumping spider stats:
    {'downloader/request_bytes': 2447,
     'downloader/request_count': 3,
     'downloader/request_method_count/GET': 2,
     'downloader/request_method_count/POST': 1,
     'downloader/response_bytes': 77766,
     'downloader/response_count': 3,
     'downloader/response_status_count/200': 3,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2012, 8, 23, 1, 55, 33, 837619),
     'request_depth_max': 1,
     'scheduler/memory_enqueued': 3,
     'start_time': datetime.datetime(2012, 8, 23, 1, 55, 31, 271262)}

環境：

stav@maia:myproj$ scrapy version -v
Scrapy  : 0.15.1
lxml    : 2.3.2.0
libxml2 : 2.7.8
Twisted : 11.1.0
Python  : 2.7.3 (default, Aug  1 2012, 05:14:39) - [GCC 4.6.3]
Platform: Linux-3.2.0-29-generic-x86_64-with-Ubuntu-12.04-precise

Answer 4

當需要啟用js，並且沒有可用的顯示時，phantomjs是一個很好的解決方案，認為它是js，而不是python：$

Answer 5

你可以試試PhantomJS，一個帶有Javascript API的無頭webkit http://phantomjs.org/它支持程序化的支持Javascript的瀏覽。

Answer 6

為什么不使用FancyURLOpener ？ 它處理標准HTTP錯誤並具有prompt_user_passwd()函數。 從鏈接：

執行基本身份驗證時， FancyURLopener實例會調用其prompt_user_passwd()方法。 默認實現要求用戶提供有關控制終端的所需信息。 如果需要，子類可以重寫此方法以支持更合適的行為。

如何從ubuntu服務器以編程方式登錄到Yahoo

問題描述

6 個解決方案

解決方案1
3 2012-08-17 23:17:09

解決方案2
2 2012-08-24 06:22:22

解決方案3
1 2012-08-23 14:08:46

解決方案4
1 2012-08-23 21:36:26

解決方案5
0 2012-08-29 16:21:09

解決方案6
0 2012-08-29 17:06:22

如何從ubuntu服務器以編程方式登錄到Yahoo

問題描述

6 個解決方案

解決方案1 3 2012-08-17 23:17:09

解決方案2 2 2012-08-24 06:22:22

解決方案3 1 2012-08-23 14:08:46

解決方案4 1 2012-08-23 21:36:26

解決方案5 0 2012-08-29 16:21:09

解決方案6 0 2012-08-29 17:06:22

解決方案1
3 2012-08-17 23:17:09

解決方案2
2 2012-08-24 06:22:22

解決方案3
1 2012-08-23 14:08:46

解決方案4
1 2012-08-23 21:36:26

解決方案5
0 2012-08-29 16:21:09

解決方案6
0 2012-08-29 17:06:22