简体   繁体   English

解析 HTTP 用户代理字符串

[英]Parsing HTTP User-Agent string

What is the best method to parse a User-Agent string in Python to reliably detect在 Python 中解析 User-Agent 字符串以可靠检测的最佳方法是什么

  1. Browser浏览器
  2. Browser version浏览器版本
  3. OS操作系统

Or perhaps any helper library that does it或者任何可以做到这一点的帮助库

I finally decided to write my own, and I am happy with the outcome.我终于决定自己写,我对结果很满意。 Please feel free to use/modify/send me patches, etc.请随意使用/修改/发送补丁等。

It's here: http://pypi.python.org/pypi/httpagentparser它在这里: http : //pypi.python.org/pypi/httpagentparser

Usage example:用法示例:

>>> import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) \
        Chrome/5.0.307.11 Safari/532.9"
>>> print(httpagentparser.simple_detect(s))
('Linux', 'Chrome 5.0.307.11')
>>> print(httpagentparser.detect(s))
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

>>> s = "Mozilla/5.0 (Linux; U; Android 2.3.5; en-in; HTC_DesireS_S510e Build/GRJ90) \
        AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1"
>>> print(httpagentparser.simple_detect(s))
('Android Linux 2.3.5', 'Safari 4.0')
>>> print(httpagentparser.detect(s))
{'dist': {'version': '2.3.5', 'name': 'Android'},
'os': {'name': 'Linux'},
'browser': {'version': '4.0', 'name': 'Safari'}}

UASparser for Python by Hicro Kee. 适用于 Python的 UASparser 由 Hicro Kee 编写。 Auto updated datafile and cache from remote server with version checking.通过版本检查从远程服务器自动更新数据文件和缓存。

Werkzeug has user-agent parsing built-in. Werkzeug 内置了用户代理解析。

New link (Jun 2018) http://werkzeug.pocoo.org/docs/0.14/utils/#module-werkzeug.useragents新链接(2018 年 6 月) http://werkzeug.pocoo.org/docs/0.14/utils/#module-werkzeug.useragents

The other responses to this question are rather old now.对这个问题的其他回答现在已经很老了。 I believe the new standard in Browser User Agent parsing is Browserscope's user agent parser .我相信浏览器用户代理解析的新标准是 Browserscope 的用户代理解析器

Also conveniently available with the exact same matching patterns in many other languages .也可以方便地使用许多其他语言完全相同的匹配模式 Someday you might want to also parse some UA strings in JavaScript and you don't need to worry about inconsistent parsing.有一天,您可能还想在 JavaScript 中解析一些 UA 字符串,而不必担心解析不一致。

Having run these suggestions against the full corpus of Firefox User Agents , I've found that the version-number parsing for comparison is quite poor.在针对Firefox User Agents 的完整语料库运行这些建议后,我发现用于比较的版本号解析非常差。

If that's what you need, I suggest that you take a look at UAparser , which used to be part of the browserscope project.如果这就是你所需要的,我建议你看看UAparser ,它曾经是browserscope项目的一部分。 Documentation here. 文档在这里。

Th Browser Cap Parser should work.浏览器上限解析器应该可以工作。 It may be a bit slow though..不过可能有点慢。。

However if you wish to parse all this on the Python side you can use the XML/INI files provided at http://browsers.garykeith.com/downloads.asp to do lookups on the user agent.但是,如果您希望在 Python 端解析所有这些,您可以使用http://browsers.garykeith.com/downloads.asp提供的 XML/INI 文件在用户代理上进行查找。 This is the same file that is used in php's get_browser() function.这与 php 的 get_browser() 函数中使用的文件相同。

As this is not about an open source solution, I doubt this will become the first answer.由于这与开源解决方案无关,我怀疑这将成为第一个答案。 Anyway, when it comes to User-Agent analysis, the de-facto standard is WURFL (now a commercial product.无论如何,当涉及到 User-Agent 分析时,事实上的标准是 WURFL(现在是一个商业产品。

Here is a reference to the technical docs.这是对技术文档的参考。

https://docs.scientiamobile.com/documentation/infuze/infuze-python-module-user-guide https://docs.scientiamobile.com/documentation/infuze/infuze-python-module-user-guide

In addition to that, WURFL Microservice is available on the major Cloud Providers marketplaces and also supports a Python client :除此之外, WURFL 微服务可在主要的云提供商市场上使用,并且还支持 Python 客户端

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM