![](/img/trans.png)
[英]Authentication Failed - 'Authorization' header is missing - Python HTTP request to Azure
[英]Parse an HTTP request Authorization header with Python
我需要这样的标题:
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
并使用 Python 将其解析为:
{'protocol':'Digest',
'qop':'chap',
'realm':'testrealm@host.com',
'username':'Foobear',
'response':'6629fae49393a05397450978507c4ef1',
'cnonce':'5ccc069c403ebaf9f0171e9517f40e41'}
有没有图书馆可以做到这一点,或者我可以寻找灵感?
我正在 Google App Engine 上执行此操作,我不确定 Pyparsing 库是否可用,但如果它是最佳解决方案,也许我可以将它包含在我的应用程序中。
目前我正在创建我自己的 MyHeaderParser 对象,并在标头字符串上将它与 reduce() 一起使用。 它正在工作,但非常脆弱。
nadia 的出色解决方案如下:
import re
reg = re.compile('(\w+)[=] ?"?(\w+)"?')
s = """Digest
realm="stackoverflow.com", username="kixx"
"""
print str(dict(reg.findall(s)))
一个小正则表达式:
import re
reg=re.compile('(\w+)[:=] ?"?(\w+)"?')
>>>dict(reg.findall(headers))
{'username': 'Foobear', 'realm': 'testrealm', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'response': '6629fae49393a05397450978507c4ef1', 'Authorization': 'Digest'}
您也可以像 [CheryPy][1] 一样使用 urllib2。
这是片段:
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
import urllib2
field, sep, value = input.partition("Authorization: Digest ")
if value:
items = urllib2.parse_http_list(value)
opts = urllib2.parse_keqv_list(items)
opts['protocol'] = 'Digest'
print opts
它输出:
{'username': 'Foobear', 'protocol': 'Digest', 'qop': 'chap', 'cnonce': '5ccc069c403ebaf9f0171e9517f40e41', 'realm': 'testrealm@host.com', 'response': '6629fae49393a05397450978507c4ef1'}
[1]: https : //web.archive.org/web/20130118133623/http : //www.google.com : 80/codesearch/p?hl=en#OQvO9n2mc04/CherryPy-3.0.1/cherrypy/lib/ httpauth.py&q=授权摘要 http lang:python
这是我的 pyparsing 尝试:
text = """Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foobear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41" """
from pyparsing import *
AUTH = Keyword("Authorization")
ident = Word(alphas,alphanums)
EQ = Suppress("=")
quotedString.setParseAction(removeQuotes)
valueDict = Dict(delimitedList(Group(ident + EQ + quotedString)))
authentry = AUTH + ":" + ident("protocol") + valueDict
print authentry.parseString(text).dump()
打印:
['Authorization', ':', 'Digest', ['qop', 'chap'], ['realm', 'testrealm@host.com'],
['username', 'Foobear'], ['response', '6629fae49393a05397450978507c4ef1'],
['cnonce', '5ccc069c403ebaf9f0171e9517f40e41']]
- cnonce: 5ccc069c403ebaf9f0171e9517f40e41
- protocol: Digest
- qop: chap
- realm: testrealm@host.com
- response: 6629fae49393a05397450978507c4ef1
- username: Foobear
我不熟悉 RFC,但我希望这能让您有所收获。
一个较旧的问题,但我发现它很有帮助。
(在最近的投票后编辑)我创建了一个基于此答案的包(链接到测试以查看如何使用包中的类)。
pip install authparser
我需要一个解析器来处理任何格式正确的 Authorization 标头,如RFC7235所定义(如果您喜欢阅读 ABNF,请举手)。
Authorization = credentials
BWS = <BWS, see [RFC7230], Section 3.2.3>
OWS = <OWS, see [RFC7230], Section 3.2.3>
Proxy-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS
challenge ] )
Proxy-Authorization = credentials
WWW-Authenticate = *( "," OWS ) challenge *( OWS "," [ OWS challenge
] )
auth-param = token BWS "=" BWS ( token / quoted-string )
auth-scheme = token
challenge = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param ) *(
OWS "," [ OWS auth-param ] ) ] ) ]
credentials = auth-scheme [ 1*SP ( token68 / [ ( "," / auth-param )
*( OWS "," [ OWS auth-param ] ) ] ) ]
quoted-string = <quoted-string, see [RFC7230], Section 3.2.6>
token = <token, see [RFC7230], Section 3.2.6>
token68 = 1*( ALPHA / DIGIT / "-" / "." / "_" / "~" / "+" / "/" )
*"="
从PaulMcG的回答开始,我想出了这个:
import pyparsing as pp
tchar = '!#$%&\'*+-.^_`|~' + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
auth_parser = header + pp.Suppress(':') + credentials
这允许解析任何 Authorization 标头:
parsed = auth_parser.parseString('Authorization: Basic Zm9vOmJhcg==')
print('Authenticating with {0} scheme, token: {1}'.format(parsed['scheme'], parsed['token']))
输出:
Authenticating with Basic scheme, token: Zm9vOmJhcg==
将它们整合到一个Authenticator
类中:
import pyparsing as pp
from base64 import b64decode
import re
class Authenticator:
def __init__(self):
"""
Use pyparsing to create a parser for Authentication headers
"""
tchar = "!#$%&'*+-.^_`|~" + pp.nums + pp.alphas
t68char = '-._~+/' + pp.nums + pp.alphas
token = pp.Word(tchar)
token68 = pp.Combine(pp.Word(t68char) + pp.ZeroOrMore('='))
scheme = token('scheme')
auth_header = pp.Keyword('Authorization')
name = pp.Word(pp.alphas, pp.alphanums)
value = pp.quotedString.setParseAction(pp.removeQuotes)
name_value_pair = name + pp.Suppress('=') + value
params = pp.Dict(pp.delimitedList(pp.Group(name_value_pair)))
credentials = scheme + (token68('token') ^ params('params'))
# the moment of truth...
self.auth_parser = auth_header + pp.Suppress(':') + credentials
def authenticate(self, auth_header):
"""
Parse auth_header and call the correct authentication handler
"""
authenticated = False
try:
parsed = self.auth_parser.parseString(auth_header)
scheme = parsed['scheme']
details = parsed['token'] if 'token' in parsed.keys() else parsed['params']
print('Authenticating using {0} scheme'.format(scheme))
try:
safe_scheme = re.sub("[!#$%&'*+-.^_`|~]", '_', scheme.lower())
handler = getattr(self, 'auth_handle_' + safe_scheme)
authenticated = handler(details)
except AttributeError:
print('This is a valid Authorization header, but we do not handle this scheme yet.')
except pp.ParseException as ex:
print('Not a valid Authorization header')
print(ex)
return authenticated
# The following methods are fake, of course. They should use what's passed
# to them to actually authenticate, and return True/False if successful.
# For this demo I'll just print some of the values used to authenticate.
@staticmethod
def auth_handle_basic(token):
print('- token is {0}'.format(token))
try:
username, password = b64decode(token).decode().split(':', 1)
except Exception:
raise DecodeError
print('- username is {0}'.format(username))
print('- password is {0}'.format(password))
return True
@staticmethod
def auth_handle_bearer(token):
print('- token is {0}'.format(token))
return True
@staticmethod
def auth_handle_digest(params):
print('- username is {0}'.format(params['username']))
print('- cnonce is {0}'.format(params['cnonce']))
return True
@staticmethod
def auth_handle_aws4_hmac_sha256(params):
print('- Signature is {0}'.format(params['Signature']))
return True
要测试这个类:
tests = [
'Authorization: Digest qop="chap", realm="testrealm@example.com", username="Foobar", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"',
'Authorization: Bearer cn389ncoiwuencr',
'Authorization: Basic Zm9vOmJhcg==',
'Authorization: AWS4-HMAC-SHA256 Credential="AKIAIOSFODNN7EXAMPLE/20130524/us-east-1/s3/aws4_request", SignedHeaders="host;range;x-amz-date", Signature="fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024"',
'Authorization: CrazyCustom foo="bar", fizz="buzz"',
]
authenticator = Authenticator()
for test in tests:
authenticator.authenticate(test)
print()
哪些输出:
Authenticating using Digest scheme
- username is Foobar
- cnonce is 5ccc069c403ebaf9f0171e9517f40e41
Authenticating using Bearer scheme
- token is cn389ncoiwuencr
Authenticating using Basic scheme
- token is Zm9vOmJhcg==
- username is foo
- password is bar
Authenticating using AWS4-HMAC-SHA256 scheme
- signature is fe5f80f77d5fa3beca038a248ff027d0445342fe2855ddc963176630326f1024
Authenticating using CrazyCustom scheme
This is a valid Authorization header, but we do not handle this scheme yet.
将来,如果我们希望处理 CrazyCustom,我们只需添加
def auth_handle_crazycustom(params):
如果这些组件总是在那里,那么正则表达式就可以解决问题:
test = '''Authorization: Digest qop="chap", realm="testrealm@host.com", username="Foobear", response="6629fae49393a05397450978507c4ef1", cnonce="5ccc069c403ebaf9f0171e9517f40e41"'''
import re
re_auth = re.compile(r"""
Authorization:\s*(?P<protocol>[^ ]+)\s+
qop="(?P<qop>[^"]+)",\s+
realm="(?P<realm>[^"]+)",\s+
username="(?P<username>[^"]+)",\s+
response="(?P<response>[^"]+)",\s+
cnonce="(?P<cnonce>[^"]+)"
""", re.VERBOSE)
m = re_auth.match(test)
print m.groupdict()
产生:
{ 'username': 'Foobear',
'protocol': 'Digest',
'qop': 'chap',
'cnonce': '5ccc069c403ebaf9f0171e9517f40e41',
'realm': 'testrealm@host.com',
'response': '6629fae49393a05397450978507c4ef1'
}
我建议找到一个正确的库来解析 http 标头,不幸的是我无法调用任何。 :(
一段时间检查下面的代码段(它应该主要工作):
input= """
Authorization: Digest qop="chap",
realm="testrealm@host.com",
username="Foob,ear",
response="6629fae49393a05397450978507c4ef1",
cnonce="5ccc069c403ebaf9f0171e9517f40e41"
"""
field, sep, value = input.partition(":")
if field.endswith('Authorization'):
protocol, sep, opts_str = value.strip().partition(" ")
opts = {}
for opt in opts_str.split(",\n"):
key, value = opt.strip().split('=')
key = key.strip(" ")
value = value.strip(' "')
opts[key] = value
opts['protocol'] = protocol
print opts
您使用 PyParsing 的原始概念将是最好的方法。 您隐含地要求的是需要语法的东西……也就是说,正则表达式或简单的解析例程总是很脆弱,这听起来像是您试图避免的东西。
在 Google App Engine 上进行 pyparsing似乎很容易: 如何在 Google App Engine 上设置 PyParsing?
所以我会这样做,然后从 rfc2617 实现完整的 HTTP 身份验证/授权标头支持。
http 摘要 Authorization 标头字段有点奇怪。 它的格式类似于rfc 2616的 Cache-Control 和 Content-Type 标头字段的格式,但只是不同到不兼容。 如果您仍在寻找比正则表达式更智能、更易读的库,您可以尝试使用str.split()删除 Authorization: Digest 部分,并使用Werkzeug的 http 模块中的parse_dict_header()解析其余部分。 (Werkzeug 可以安装在 App Engine 上。)
Nadia 的正则表达式只匹配参数值的字母数字字符。 这意味着它无法解析至少两个字段。 即uri和qop。 根据 RFC 2617,uri 字段是请求行(即 HTTP 请求的第一行)中字符串的副本。 如果值是“auth-int”,则由于非字母数字“-”,qop 无法正确解析。
这个修改后的正则表达式允许 URI(或任何其他值)包含除 ' '(空格)、'"'(qoute)或 ','(逗号)之外的任何内容。这可能比它需要的更宽容,但应该'不会导致正确形成的 HTTP 请求出现任何问题。
reg re.compile('(\w+)[:=] ?"?([^" ,]+)"?')
额外提示:从那里开始,将 RFC-2617 中的示例代码转换为 python 是相当直接的。 使用python的md5 API,“MD5Init()”变成“m = md5.new()”,“MD5Update()”变成“m.update()”,“MD5Final()”变成“m.digest()”。
如果您的响应来自一个从不变化的字符串,并且行数与要匹配的表达式一样多,您可以将其拆分为一个名为authentication_array
的换行数组,并使用正则表达式:
pattern_array = ['qop', 'realm', 'username', 'response', 'cnonce']
i = 0
parsed_dict = {}
for line in authentication_array:
pattern = "(" + pattern_array[i] + ")" + "=(\".*\")" # build a matching pattern
match = re.search(re.compile(pattern), line) # make the match
if match:
parsed_dict[match.group(1)] = match.group(2)
i += 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.