[英]regex for extracting all urls from dict like string
這是我必須從中提取網址的字符串
s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"
我嘗試的代碼直到現在只打印
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', s)
但它僅打印此URL的重復
['https://www.riteaid.com']
正如您提到的像字符串一樣的字典,您必須在特定情況下使用正則表達式才能使用。
s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"
urls = re.findall(r"url:'(https?://.*?)'}", s)
result:
['https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442',
'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009',
'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249',
'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568']
說明
url:'(http :文字字符串
S' :可選文字字符“ s”
。*? :非貪婪的任何字符。
'} ::文字字符串
如果您必須在當前示例中使用正則表達式來在{url:'
和'}
之間進行匹配,則可以使用正向后視 (?<=
和正向前瞻(?=
並使用否定的字符類[^']+
來匹配url [^']+
不匹配'
一次或多次。
您還可以減少示例數據的限制,而忽略前導{
和尾隨}
:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.