简体   繁体   English

在 Python 请求中使用 cookies.txt 文件

[英]Using cookies.txt file with Python Requests

我正在尝试使用带有 Python 请求的cookies.txt<\/code>文件(使用 Chrome 扩展程序生成)访问经过身份验证的站点:

import requests, cookielib

cj = cookielib.MozillaCookieJar('cookies.txt')
cj.load()
r = requests.get(url, cookies=cj)

MozillaCookieJar inherits from FileCookieJar which has the following docstring in its constructor: MozillaCookieJar继承自FileCookieJar ,它的构造函数中有以下文档字符串:

Cookies are NOT loaded from the named file until either the .load() or
.revert() method is called.

You need to call .load() method then.然后你需要调用.load()方法。

Also, like Jermaine Xu noted the first line of the file needs to contain either # Netscape HTTP Cookie File or # HTTP Cookie File string.另外,就像 Jermaine Xu 指出的,文件的第一行需要包含# Netscape HTTP Cookie File# HTTP Cookie File字符串。 Files generated by the plugin you use do not contain such a string so you have to insert it yourself.您使用的插件生成的文件不包含这样的字符串,因此您必须自己插入。 I raised appropriate bug at http://code.google.com/p/cookie-txt-export/issues/detail?id=5我在http://code.google.com/p/cookie-txt-export/issues/detail?id=5提出了适当的错误

EDIT编辑

Session cookies are saved with 0 in the 5th column.会话 cookie 在第 5 列中保存为 0。 If you don't pass ignore_expires=True to load() method all such cookies are discarded when loading from a file.如果您不将ignore_expires=True传递给load()方法,则从文件加载时所有此类 cookie 都会被丢弃。

File session_cookie.txt :文件session_cookie.txt

# Netscape HTTP Cookie File
.domain.com TRUE    /   FALSE   0   name    value

Python script:蟒蛇脚本:

import cookielib

cj = cookielib.MozillaCookieJar('session_cookie.txt')
cj.load()
print len(cj)

Output: 0输出: 0

EDIT 2编辑 2

Although we managed to get cookies into the jar above they are subsequently discarded by cookielib because they still have 0 value in the expires attribute.尽管我们设法将 cookie 放入上面的 jar 中,但它们随后cookielib 丢弃,因为它们的expires属性中仍然有0值。 To prevent this we have to set the expire time to some future time like so:为了防止这种情况,我们必须将过期时间设置为将来的某个时间,如下所示:

for cookie in cj:
    # set cookie expire date to 14 days from now
    cookie.expires = time.time() + 14 * 24 * 3600

EDIT 3编辑 3

I checked both wget and curl and both use 0 expiry time to denote session cookies which means it's the de facto standard.我检查了 wget 和 curl 并且都使用0到期时间来表示会话 cookie,这意味着它是事实上的标准。 However Python's implementation uses empty string for the same purpose hence the problem raised in the question.然而,Python 的实现出于相同目的使用空字符串,因此问题中提出了问题。 I think Python's behavior in this regard should be in line with what wget and curl do and that's why I raised the bug at http://bugs.python.org/issue17164我认为 Python 在这方面的行为应该与 wget 和 curl 所做的一致,这就是我在http://bugs.python.org/issue17164提出错误的原因
I'll note that replacing 0 s with empty strings in the 5th column of the input file and passing ignore_discard=True to load() is the alternate way of solving the problem (no need to change expiry time in this case).我会注意到在输入文件的第 5 列中用空字符串替换0并将ignore_discard=True传递给load()是解决问题的替代方法(在这种情况下无需更改到期时间)。

I tried taking into account everything that Piotr Dobrogost had valiantly figured out about MozillaCookieJar but to no avail.我尝试考虑 Piotr Dobrogost 勇敢地想出的关于MozillaCookieJar一切,但无济于事。 I got fed up and just parsed the damn cookies.txt myself and now all is well:我受够了,只是自己解析了该死的cookies.txt ,现在一切都很好:

import re
import requests

def parseCookieFile(cookiefile):
    """Parse a cookies.txt file and return a dictionary of key value pairs
    compatible with requests."""

    cookies = {}
    with open (cookiefile, 'r') as fp:
        for line in fp:
            if not re.match(r'^\#', line):
                lineFields = line.strip().split('\t')
                cookies[lineFields[5]] = lineFields[6]
    return cookies

cookies = parseCookieFile('cookies.txt')

import pprint
pprint.pprint(cookies)

r = requests.get('https://example.com', cookies=cookies)

This worked for me:这对我有用:

from http.cookiejar import MozillaCookieJar
from pathlib import Path
import requests

cookies = Path('/Users/name/cookies.txt')
jar = MozillaCookieJar(cookies)
jar.load()
requests.get('https://path.to.site.com', cookies=jar)
<Response [200]>

I tried editing Tristan answer to add some info to it but it seems SO edit q is full therefore, I am writing this answer, since, I have struggled real bad on using existing cookies with python request.我尝试编辑 Tristan 答案以向其中添加一些信息,但似乎因此编辑 q 已满,因此我正在编写此答案,因为我在将现有 cookie 与 python 请求一起使用时遇到了很大的困难。

  1. First, get the cookies from the Chrome.首先,从 Chrome 获取 cookie。 Easiest way would be to use an extension called 'cookies.txt'最简单的方法是使用名为“cookies.txt”的扩展名
https://chrome.google.com/webstore/detail/get-cookiestxt/bgaddhkoddajcdgocldbbfleckgcbcid/related
  1. After downloading those cookies, use the below code to make sure that you are able to parse the file without any issues.下载这些 cookie 后,请使用以下代码确保您能够毫无问题地解析文件。
    import re
    import requests
    
    def parseCookieFile(cookiefile):
        """Parse a cookies.txt file and return a dictionary of key value pairs
        compatible with requests."""
    
        cookies = {}
        with open (cookiefile, 'r') as fp:
            for line in fp:
                if not re.match(r'^\#', line):
                    lineFields = line.strip().split('\t')
                    if len(lineFields) > 2:
                        try:
                            cookies[lineFields[5]] = lineFields[6]
                        except:
                            pass
        return cookies
    
    cookies = parseCookieFile('cookies.txt') #replace the filename
    
    import pprint
    pprint.pprint(cookies)
  1. Next, use those cookies with python request接下来,将这些 cookie 与 python 请求一起使用
x = requests.get('your__url', verify=False, cookies=cookies)
print (x.content)

This should save your day from going on different SO posts and trying those cookielib and other methods which never worked for me.这应该可以让您免于发表不同的 SO 帖子并尝试那些对我来说从来没有用过的 cookielib 和其他方法。

I finally found a way to make it work (I got the idea by looking at curl 's verbose ouput): instead of loading my cookies from a file, I simply created a dict with the required value/name pairs:我终于找到了一种使它工作的方法(我通过查看curl的详细输出得到了这个想法):我没有从文件中加载我的 cookie,而是简单地创建了一个具有所需value/name对的dict

cd = {'v1': 'n1', 'v2': 'n2'}
r = requests.get(url, cookies=cd)

and it worked (although it doesn't explain why the previous method didn't).并且它有效(虽然它没有解释为什么以前的方法没有)。 Thanks for all the help, it's really appreciated.感谢所有的帮助,真的很感激。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM