[英]how do I extract the Url data from my string
I have the following string which contains many Url values. 我有以下包含许多网址值的字符串。 How do I extract the Url after the DataUrl term in this string? 如何在此字符串中的DataUrl项后提取Url? So I get a list of Urls for example: americanexpress.com, vice.com, chegg.com 因此,我获得了Urls的列表,例如:americanexpress.com,Vice.com,chegg.com
{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'} {'DataUrl':'americanexpress.com','Country':{'Rank':'96','Reach':{'PerMillion':'7350'},'PageViews':{'PerMillion':'600.2' ,'PerUser':'3.6'}},'Global':{'Rank':'362'}},{'DataUrl':'vice.com','Country':{'Rank':'97', 'Reach':{'PerMillion':'15703.61'},'PageViews':{'PerMillion':'489.97','PerUser':'1.38'}},'Global':{'Rank':'208'} },{'DataUrl':'chegg.com','Country':{'Rank':'98','Reach':{'PerMillion':'6280'},'PageViews':{'PerMillion':' 882.3','PerUser':'6.2'}},'Global':{'Rank':'402'}},{'DataUrl':'mlb.com','Country':{'Rank':'99 ','Reach':{'PerMillion':'7280'},'PageViews':{'PerMillion':'564.1','PerUser':'3.42'}},'Global':{'Rank':'427 '}},{'DataUrl':'xnxx.com','Country':{'Rank':'100','Reach':{'PerMillion':'5560'},'PageViews':{'PerMillion' :'1271','PerUser':'10 .1'}},'Global':{'Rank':'95'}
I have tried various FindAll expressions. 我尝试了各种FindAll表达式。
Python has a built-in package called json, which can be used to work with JSON data. Python有一个名为json的内置程序包,可用于处理JSON数据。
You can convert your python object to a json object and then get DataUrl easily. 您可以将python对象转换为json对象,然后轻松获取DataUrl。
Please refer to https://www.w3schools.com/python/python_json.asp 请参考https://www.w3schools.com/python/python_json.asp
It looks like part of JSON
data so if you have complet JSON
data then you could use module json
to load it and search DataUrl
in dictionary. 它看起来像JSON
数据的一部分,因此,如果您具有完整的JSON
数据,则可以使用json
模块加载它并在字典中搜索DataUrl
。
If you have incomplet JSON data then you can use regex
如果您的JSON数据不完整,则可以使用regex
text = '''{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'}'''
import re
urls = re.findall("'DataUrl': '([^']*)'", text)
print(urls)
Result 结果
['americanexpress.com', 'vice.com', 'chegg.com', 'mlb.com', 'xnxx.com']
You can also try to do it with .split("{'DataUrl': '")
and split("',")
您也可以尝试使用.split("{'DataUrl': '")
和split("',")
text = '''{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'}'''
urls = text.split("{'DataUrl': '")
urls = [item.split("',")[0] for item in urls if item]
print(urls)
Result 结果
['americanexpress.com', 'vice.com', 'chegg.com', 'mlb.com', 'xnxx.com']
if you had complete and correctly formatted JSON - with "
instead of '
- then you could use module json
如果你有完整和格式正确无误JSON -用"
,而不是'
-那么你可以使用模块json
Here I use complete JSON 在这里我使用完整的JSON
text = '''[{'DataUrl': 'americanexpress.com', 'Country': {'Rank': '96', 'Reach': {'PerMillion': '7350'}, 'PageViews': {'PerMillion': '600.2', 'PerUser': '3.6'}}, 'Global': {'Rank': '362'}}, {'DataUrl': 'vice.com', 'Country': {'Rank': '97', 'Reach': {'PerMillion': '15703.61'}, 'PageViews': {'PerMillion': '489.97', 'PerUser': '1.38'}}, 'Global': {'Rank': '208'}}, {'DataUrl': 'chegg.com', 'Country': {'Rank': '98', 'Reach': {'PerMillion': '6280'}, 'PageViews': {'PerMillion': '882.3', 'PerUser': '6.2'}}, 'Global': {'Rank': '402'}}, {'DataUrl': 'mlb.com', 'Country': {'Rank': '99', 'Reach': {'PerMillion': '7280'}, 'PageViews': {'PerMillion': '564.1', 'PerUser': '3.42'}}, 'Global': {'Rank': '427'}}, {'DataUrl': 'xnxx.com', 'Country': {'Rank': '100', 'Reach': {'PerMillion': '5560'}, 'PageViews': {'PerMillion': '1271', 'PerUser': '10.1'}}, 'Global': {'Rank': '95'}}]'''
text = text.replace("'", '"')
import json
data = json.loads(text)
urls = [item['DataUrl'] for item in data]
print(urls)
Result 结果
['americanexpress.com', 'vice.com', 'chegg.com', 'mlb.com', 'xnxx.com']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.