簡體   English   中英

正則表達式用於從dict像字符串中提取所有網址

[英]regex for extracting all urls from dict like string

這是我必須從中提取網址的字符串

s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"

我嘗試的代碼直到現在只打印

urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', s)

但它僅打印此URL的重復

    ['https://www.riteaid.com']

正如您提到的像字符串一樣的字典,您必須在特定情況下使用正則表達式才能使用。

s = "'0352442':{url:'https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442'},'0370009':{url:'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009'},'0303249':{url:'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249'},'0398568':{url:'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568'},}"

urls = re.findall(r"url:'(https?://.*?)'}", s)

result:
['https://www.riteaid.com/shop/nexium-24hr-42-ct-capsules-0352442',
 'https://www.riteaid.com/shop/rite-aid-pharmacy-epsom-salt-first-aid-6-lb-2-72-kg-0370009',
 'https://www.riteaid.com/shop/huggies-natural-care-unscented-baby-wipes-soft-pack-56-count-0303249',
 'https://www.riteaid.com/shop/rite-aid-sterile-pads-4-x4-25-ea-0398568']

說明

url:'(http :文字字符串

S' :可選文字字符“ s”

。*? :非貪婪的任何字符。

'} ::文字字符串

如果您必須在當前示例中使用正則表達式來在{url:''}之間進行匹配,則可以使用正向后視 (?<=和正向前瞻(?=並使用否定的字符類[^']+來匹配url [^']+不匹配'一次或多次。

(?<={url:')[^']+(?='})

演示

您還可以減少示例數據的限制,而忽略前導{和尾隨}

(?<=url:')[^']+(?=')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM