简体   繁体   中英

How to parse user agent string using Python

<field name="http.user_agent" showname="User-Agent: CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)\r\n" size="62" pos="542" show="CORE/6.506.4.1 OpenCORE/2.02 (Linux;Android 2.2)" value="557365722d4167656e743a20434f52452f362e3530362e342e31204f70656e434f52452f322e303220284c696e75783b416e64726f696420322e32290d0a"/>

<field name="http.user_agent" showname="User-Agent: HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5\r\n" size="67" pos="570" show="HTC Streaming Player htc_wwe / 1.0 / htc_vivo / 2.3.5" value="557365722d4167656e743a204854432053747265616d696e6720506c61796572206874635f777765202f20312e30202f206874635f7669766f202f20322e332e350d0a"/>

<field name="http.user_agent" showname="User-Agent: AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)\r\n" size="85" pos="639" show="AppleCoreMedia/1.0.0.8C148 (iPad; U; CPU OS 4_2_1 like Mac OS X; sv_se)" value="557365722d4167656e743a204170706c65436f72654d656469612f312e302e302e38433134382028695061643b20553b20435055204f5320345f325f31206c696b65204d6163204f5320583b2073765f7365290d0a"/>

The samples of the urls I've got are listed above. I am wondering if there is any module in Python which I can use to parse the user-agent. I want to get the output from these samples like:

Android
HTC Streaming player
ipad

and if it is a PC user, I want to get the web browser type.

There is a library called httpagentparser for that:

import httpagentparser
>>> s = "Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9"
>>> print httpagentparser.simple_detect(s)
('Linux', 'Chrome 5.0.307.11')
>>> print httpagentparser.detect(s)
{'os': {'name': 'Linux'},
 'browser': {'version': '5.0.307.11', 'name': 'Chrome'}}

Werkzeug has a user agent parser built in.

http://werkzeug.pocoo.org/docs/quickstart/?highlight=user_agent#header-parsing

from werkzeug.test import create_environ
from werkzeug.wrappers import Request

environ = create_environ()
environ.update(HTTP_USER_AGENT=('Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
    ' AppleWebKit/537.36 (KHTML, like Gecko)'
    ' Chrome/76.0.3809.100 Safari/537.36'))
request = Request(environ)

request.user_agent.browser
'chrome'

The answer I am about to give is not about an open-source project, but it does provide information that whoever is researching how to parse the HTTP user-agent string to obtain device intelligence will want to know about.

WURFL is a time-honored tool to do User-Agent (and more generally HTTP request) analysis and obtain easily consumable device/browser information. This is the de-facto standard in the Ad Tech industry to squeeze the last drop of information out of HTTP requests, thanks to a proprietary database. In practice, code will look something like:

from pywurfl.wurfl import Wurfl

# Create a WURFL Engine. Please note that the installed wurfl.zip path may change.
# for example, on OS X systems, it will be in `/usr/local/share/wurfl/wurfl.zip`
# on Linux systems, it will be in `/usr/share/wurfl/wurfl.zip`.
wurfl = Wurfl('/usr/share/wurfl/wurfl.zip')

# Lookup an HTTP request
http_request = {
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp",
    "user-agent": " Mozilla/5.0 (Linux; Android 10; SM-G981U1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Mobile Safari/537.36",
}
dev = wurfl.parse_headers(http_request)

# You can also lookup a device with just the user-agent string
# dev = wurfl.parse_useragent(user_agent)

# retrieve some properties and capabilities values

# WURFL device ID:
print("device id =", dev.id)

# Some static capabilities:
static_capabilities = ["model_name", "brand_name", "device_os"]

# Retrieve the value of a single static capability:
print("get_capability('model_name') =",
      dev.get_capability(static_capabilities[0]))

# Retrieve the value of many static capabilities at once:
print("get_capabilities(static_capabilities) =",
      dev.get_capabilities(static_capabilities))

# Some virtual capabilities:
virtual_capabilities = ["complete_device_name", "form_factor"]

# Retrieve the value of a single virtual capability:
print("get_virtual_capability('complete_device_name') =",
      dev.get_virtual_capability(virtual_capabilities[0]))

# Retrieve the value of many virtual capabilities at once:
print("get_virtual_capabilities(virtual_capabilities) =",
      dev.get_virtual_capabilities(virtual_capabilities))

# Make sure you release the device when you are finished
dev.release()

The code above, would return:

device id = samsung_sm_g981u_ver1_subuau1
get_capability('model_name') = SM-G981U1
get_capabilities(static_capabilities) = {'model_name': 'SM-G981U1', 'brand_name': 'Samsung', 'device_os': 'Android'}
get_virtual_capability('complete_device_name') = Samsung SM-G981U1 (Galaxy S20 5G)
get_virtual_capabilities(virtual_capabilities) = {'complete_device_name': 'Samsung SM-G981U1 (Galaxy S20 5G)', 'form_factor': 'Smartphone'}

More info can be found here .

For those who want to try WURFL (and PyWURFL specifically) without obtaining a Trial license from the ScientiaMobile, my company has recently released a version of WURFL (called WURFL Microservice) that can be obtained from the major marketplaces of AWS , Azure and GCP (in addition to ScientiaMobile itself of course). Also for that product Pythion is fully supported, albeit the syntax is slightly different as that product relies on a server side component in the Cloud for updates:

from wmclient import *

try:
    client = WmClient.create("http", "localhost", 8080, "")
      :
    ua = "Mozilla/5.0 (Linux; Android 7.1.1; ONEPLUS A5000 Build/NMF26X) AppleWebKit/537.36 (KHTML, like Gecko) " \
         "Chrome/56.0.2924.87 Mobile Safari/537.36 "

    client.set_requested_static_capabilities(["brand_name", "model_name"])
    client.set_requested_virtual_capabilities(["is_smartphone", "form_factor"])
    print()
    print("Detecting device for user-agent: " + ua);

    # Perform a device detection calling WM server API
    device = client.lookup_useragent(ua)
           :
        # Let's get the device capabilities and print some of them
        capabilities = device.capabilities
        print("Detected device WURFL ID: " + capabilities["wurfl_id"])
        print("Device brand & model: " + capabilities["brand_name"] + " " + capabilities["model_name"])
        print("Detected device form factor: " + capabilities["form_factor"])
        if capabilities["is_smartphone"] == "true":

Fully-fledged example and reference to GitHub client-code can be found here .

Disclosure: I work for the company that provides the library described here.

您可以尝试使用正则表达式编写自己的: http : //docs.python.org/library/re.html或看看这个: http : //pypi.python.org/pypi/httpagentparser

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM