简体   繁体   中英

Remove unwanted key-value pairs from a string

So I have the following string:

__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; expires=Sat, 09-Mar-19 03:35:03 GMT; path=/; domain=.coinmarketcap.com; HttpOnly, _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; path=/, __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974; domain=.coinmarketcap.com path=/

However I need to remove garbage from this, like

expires=Sat, 09-Mar-19 03:35:03 GMT

or

domain=.coinmarketcap.com path=/

So that Im only left with the three values:

__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974

Specify what keys you want to retain:

In [193]: keys = ['__cfduid', '_version', '__session']

Now, call re.findall ( import re first):

In [194]: ' '.join(re.findall(r'(?:{}).*?;'.format('|'.join(keys)), text)
Out[194]: '__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974;'

The regex (?:{}).*?; specifies that you want to find only the key-value pairs for those selected keys. Everything else is discarded. Works as long as your string has a consistent structure ( (key=value;)+ ).

This is more generic solution for any key that starts with underscore.

import re
str_list = re.findall(r"_\w+=\w+", your_string)

out:
    ['__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503',
     '_version=a90f44e909c03fdad3caed1ec676a98472deb0f6',
     '__session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ']

re.findall returns list which you can join to get your desired output.

 "; ".join(str_list)

Another way to do it,

keys = ('__cfduid', '_version', '__session')
' '.join([x for x in text.split() if x.startswith(keys)])

It looks like you are parsing a cookie string. In that case you should use the standard library cookie parsing module - https://docs.python.org/2/library/cookie.html#Cookie.BaseCookie.load

>>> from Cookie import SimpleCookie
>>> s = SimpleCookie()
>>> s.load("__cfduid=dc3c9f85f65d39a5947d5f4850618237f1520566503; expires=Sat, 09-Mar-19 03:35:03 GMT; path=/; domain=.coinmarketcap.com; HttpOnly, _version=a90f44e909c03fdad3caed1ec676a98472deb0f6; path=/, __session=NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974; domain=.coinmarketcap.com path=/")
>>> [(k, s[k].value) for k in s.keys()]
[('__cfduid', 'dc3c9f85f65d39a5947d5f4850618237f1520566503'),
 ('_version', 'a90f44e909c03fdad3caed1ec676a98472deb0f6'),
 ('__session', 'NTgybXJTVFdKcjlrbG5JKsnaVm9V6SBhUWtxV0oxc3JZNTZUekRGb3RvYjFpZDF5WHNab2N0T3VxTDdzY1JnOGR0ZzdtUzdRZDQ3NjVwU2Lod93GG9lalMwMGNheUUybm45Q20rWWlSRUZ5YUlzNVZmd3h3b200TmR2cnRHUWY4OUxrVml3T2hMMUdrdXZOc0V6TnBxOHFBPT0tLTMyV0R3emYxME9OeDQ3cDJ4LzJycmc9PQ%3D%3D--67cb39476896467f47bdd13bb638fd5479883974')]

>>> s['__cfduid'].value
'dc3c9f85f65d39a5947d5f4850618237f1520566503'

(Python 2, Python 3 has a different import).

This will be a much better idea than attempting your own cookie parsing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM