简体   繁体   English

使用正则表达式从 url 中提取日期

[英]Extract dates from url with regex

I would like to parse the dates from the following url:我想从以下 url 中解析日期:

url='https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival= 2021-05-08 &departure= 2021-05-16 ' url='https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83% CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA% CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival= 2021-05-08 &departure= 2021-05-16 '

This is what I tried.这是我尝试过的。

arrival_date = re.split('arrival=',url)
print(arrival_date[1])

You can use the regex (\d{4}-\d{2}-\d{2}) , that means "4 digits-2 digits-2 digits".您可以使用正则表达式(\d{4}-\d{2}-\d{2}) ,这意味着“4 位 - 2 位 - 2 位”。

import re
url = 'https://www.vrbo.com/el-gr/%CE%B5%CE%BD%CE%BF%CE%B9%CE%BA%CE%B9%CE%AC%CF%83%CE%B5%CE%B9%CF%82-%CE%B5%CE%BE%CE%BF%CF%87%CE%B9%CE%BA%CF%8E%CE%BD-%CE%BA%CE%B1%CF%84%CE%BF%CE%B9%CE%BA%CE%B9%CF%8E%CE%BD/p436144?adultsCount=2&arrival=2021-05-08&departure=2021-05-16'
date_regex = r"(\d{4}-\d{2}-\d{2})"
arrival_date = re.search(r"arrival=" + date_regex, url).group(1)
departure_date = re.search(r"departure=" + date_regex, url).group(1)
print(arrival_date)     # 2021-05-08
print(departure_date)   # 2021-05-16

Assuming the data is always in this format, you can do this:假设数据始终采用这种格式,您可以这样做:

dates = list(map(lambda date: date.split("=")[1],url.split("&")[1:]))

This will return a 2 element list containing both dates and is one line To just have arrival, you can alter the [1:] to suit your needs.这将返回一个包含两个日期的 2 元素列表,并且是一行要到达,您可以更改 [1:] 以满足您的需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM