简体   繁体   English

Python Regex Match第一个日期,然后第二个

[英]Python Regex Match first date and then Second

I am working with pandas dataframes and want to create to series, Start Date and End Date from dates within the Description. 我正在使用pandas数据框,并希望根据描述中的日期创建系列化开始日期和结束日期。 I am using regex to find the occurrences of dates but can't seem to find out how to stop at the first date and then continue to find the second date. 我正在使用正则表达式查找日期的出现,但似乎无法找出如何在第一个日期处停止然后继续查找第二个日期。

Looking here: How to stop at first occurence of match? 在这里查看: 如何在比赛第一次发生时停止?

yielded an answer 给出了答案

(?s)(\d{1,2}/\d{1,2}/\d{2,4}).*

But this didn't work for me, I still was capturing all dates instead of only the first. 但这对我不起作用,我仍在捕获所有日期,而不仅仅是第一个。

Using 使用

(\d{1,2}/\d{1,2}/\d{2,4})? 

didn't work either. 也不起作用。

Essentially I am trying to get at 本质上,我试图达到

pattern_generic=re.compile('(\d{1,2}\/\d{1,2}\/\d{2,4})')   #perhaps will do start and end)
report['Start Date'] = report['Description'].apply(lambda x: re.findall(pattern_start,x))
report['End Date'] = report['Description'].apply(lambda x: re.findall(pattern_end,x))

Not sure if this is the best way to approach finding the first and second date and putting them into columns. 不确定这是否是找到第一个和第二个日期并将其放入列中的最佳方法。 Any help/advice is appreciated! 任何帮助/建议表示赞赏!

Edit: 编辑:

Example to clarify: I have a dataframe with a column titled 'Description' with various items such as 'Purchased subscription from 1/2/13-3/4/15'. 需要说明的示例:我有一个数据框,其中标题为“说明”的列包含各种项目,例如“从1/2 / 13-3 / 4/15购买的订阅”。 I want to capture the two dates into two columns, Start and End 我想将两个日期捕获为两列,即“开始”和“结束”

 Description                                       Start Date     End Date
 'Purchased Subscription from 1/2/13-3/4/15'        1/2/13        3/4/15

I'd use this: 我会用这个:

(?s)\b(\d{1,2}/\d{1,2}/\d{2,4})\b-\b(\d{1,2}/\d{1,2}/\d{2,4})\b

The start date will be in group 1 and the end date in group 2. 开始日期在第1组中,结束日期在第2组中。

You could use the below regex, 您可以使用以下正则表达式,

(?s)(\d{1,2}/\d{1,2}/\d{2,4})-(\d{1,2}/\d{1,2}/\d{2,4}).*

DEMO 演示

Assign the characters inside group index 1 to Start Date and group index 2 to End Date 将组索引1中的字符分配给Start Date ,将组索引2中的字符分配给End Date

>>> s = """'Purchased Subscription from 1/2/13-3/4/15'        1/2/13        3/4/15
foo 1/2/13-3/4/15'        5/2/13        6/4/15
1/2/13-3/4/15'        7/2/13        8/4/15
1/2/13-3/4/15'        9/2/13        10/4/15"""
>>> m = re.search(r'(?s)(\d{1,2}\/\d{1,2}\/\d{2,4})-(\d{1,2}\/\d{1,2}\/\d{2,4}).*', s)
>>> m.group(1)
'1/2/13'
>>> m.group(2)
'3/4/15'
>>> m = re.findall(r'(\d{1,2}\/\d{1,2}\/\d{2,4})-(\d{1,2}\/\d{1,2}\/\d{2,4}).*', s, re.DOTALL)
>>> m
[('1/2/13', '3/4/15')]
  .*'\s+(\d+\/\d+\/\d+)\s+(\d+\/\d+\/\d+)

Try this.Start dates will be in group1 and end dates in group2. 尝试此操作。开始日期将在组1中,结束日期将在组2中。

See Demo: 观看演示:

http://regex101.com/r/zN5mL9/1 http://regex101.com/r/zN5mL9/1

Here is the code I used to fully solve my problem: 这是我用来完全解决问题的代码:

data['End Date'] = ''
data['Start Date']=''

pattern=re.compile('(?s)(\d{1,2}\/\d{1,2}\/\d{2,4}).*?(\d{1,2}\/\d{1,2}\/\d{2,4}).*')

first_list = []
second_list = []
for x in data['Product Description']:
  m = re.search(pattern,x)
 if m is None:
      first_list.append('')
      second_list.append('')
 else:
  first_list.append(m.group(1))
  second_list.append(m.group(2))


data['Start Date'] = Series(first_list)
data['End Date'] = Series(second_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM