python-requests无法获取JSESSIONID和SessionData cookie

Question

I want to scrape a pdf file from http://www.jstor.org/stable/pdf/10.1086/512825.pdf but it wants me to accept Terms and Conditions. 我想从http://www.jstor.org/stable/pdf/10.1086/512825.pdf抓取一个pdf文件，但它希望我接受条款和条件。 While downloading from browser I found out that JSTOR saves my acceptance in 2 cookies with names JSESSIONID and SessionData but python-requests does not grab these two cookie( It grab two other cookies but not these). 从浏览器下载时，我发现JSTOR将我的接受保存在两个名为JSESSIONID和SessionData的cookie中，但是python-requests不会捕获这两个cookie（它会捕获另外两个cookie，但不是这些）。

Here is my session instantiation code: 这是我的会话实例化代码：

def get_raw_session():
    session = requests.Session()
    session.headers.update({'User-Agent': UserAgent().random})
    session.headers.update({'Connection': 'keep-alive'})
    return session

Note that I used python-requests for login-required sites several times before and it worked great but in this case it's not. 请注意，我之前多次在需要登录的站点上使用python-requests，但效果很好，但在这种情况下不是。

I guess problem is that JSTOR is built with jsp and python-requests does not support that. 我猜问题是JSTOR是用jsp构建的，而python-requests不支持。

Any Idea? 任何想法？

Answer 1

The following code is working perfectly fine for me - 以下代码对我来说非常正常-

import requests
from bs4 import BeautifulSoup

s = requests.session()
r = s.get('http://www.jstor.org/stable/pdf/10.1086/512825.pdf')
soup = BeautifulSoup(r.content)
pdfurl = 'http://www.jstor.org' + soup.find('a', id='acptTC')['href']
with open('export.pdf', 'wb') as handle:
    response = s.get(pdfurl, stream=True)
    for block in response.iter_content(1024):
        if not block:
            break
        handle.write(block)

python-requests无法获取JSESSIONID和SessionData cookie

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-06-16 21:53:17

python-requests无法获取JSESSIONID和SessionData cookie

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-06-16 21:53:17

解决方案1
0 已采纳 2015-06-16 21:53:17