I would like to parse the dates from the following url:
url='https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival= 2021-05-08 &departure= 2021-05-16 '
This is what I tried.
arrival_date = re.split('arrival=',url)
print(arrival_date[1])
You can use the regex (\d{4}-\d{2}-\d{2})
, that means "4 digits-2 digits-2 digits".
import re
url = 'https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival=2021-05-08&departure=2021-05-16'
date_regex = r"(\d{4}-\d{2}-\d{2})"
arrival_date = re.search(r"arrival=" + date_regex, url).group(1)
departure_date = re.search(r"departure=" + date_regex, url).group(1)
print(arrival_date) # 2021-05-08
print(departure_date) # 2021-05-16
Assuming the data is always in this format, you can do this:
dates = list(map(lambda date: date.split("=")[1],url.split("&")[1:]))
This will return a 2 element list containing both dates and is one line To just have arrival, you can alter the [1:] to suit your needs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.