简体   繁体   中英

Extract dates from url with regex

I would like to parse the dates from the following url:

url='https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival= 2021-05-08 &departure= 2021-05-16 '

This is what I tried.

arrival_date = re.split('arrival=',url)
print(arrival_date[1])

You can use the regex (\d{4}-\d{2}-\d{2}) , that means "4 digits-2 digits-2 digits".

import re
url = 'https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival=2021-05-08&departure=2021-05-16'
date_regex = r"(\d{4}-\d{2}-\d{2})"
arrival_date = re.search(r"arrival=" + date_regex, url).group(1)
departure_date = re.search(r"departure=" + date_regex, url).group(1)
print(arrival_date)     # 2021-05-08
print(departure_date)   # 2021-05-16

Assuming the data is always in this format, you can do this:

dates = list(map(lambda date: date.split("=")[1],url.split("&")[1:]))

This will return a 2 element list containing both dates and is one line To just have arrival, you can alter the [1:] to suit your needs.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM