正则表达式排除数字模式

Question

I have the following script that is meant to pull out record ID's within parentheses. 我有以下脚本，用于提取括号内的记录ID。 It is also meant to ignore other details that are within parenthesis. 这也意味着忽略括号内的其他详细信息。 I am running into an issue with a detail like this (YYYY-DD) where I am not sure how to exclude this, as I need to keep other Record ID's that do include '-'. 我遇到了一个类似这样的详细信息（YYYY-DD）的问题，其中我不确定如何排除它，因为我需要保留其他包含“-”的记录ID。 The script also starts from the end which is important. 该脚本也从重要的结尾开始。

script: 脚本：

df1['Doc ID'] = df['Folder Path'].str.extract('.*\((?!Data Only)(.*)\).*',expand=True)

I have tried adding: 我尝试添加：

[^\d\d\d\d-\d\d], (?!date_format) and neither work

please look at the 3rd instance, this is where my problem lies: 请查看第三个实例，这是我的问题所在：

  Folder Path                                               Doc ID
1 report/global/(Data Only)/admin (245)                     245 #245 is kept, 'Data Only' successfully ignored
2 report/regional(PRSP)/tech/(121,130,505 - RETIRED)/2018   121,130,505 - RETIRED #successfully ignores (PRSP)
3 global/report/admin (505)/(2018-03)                       2018-03 #I cannot figure out how to avoid 2018-03 or any YYYY-DD sequence and only grab 505 in this instance

Answer 1

If you want to start from the right, you should express it in your RE. 如果要从右开始，则应在您的RE中表达出来。 I would suggest this as a starting point: 我建议将此作为起点：

df1['Doc ID'] = df['Folder Path'].str.extract('\(([^(]*?)\)[^)]*$',expand=True)

But this suffers from matching the date, so let us insert a subpattern for that as well: 但这会遇到匹配日期的问题，因此让我们为此插入一个子模式：

df1['Doc ID'] = df['Folder Path'].str.extract('\(([^(]*?)\)[^)]*(?:\(\d{4}-\d{2}\))?[^)]*$',expand=True)

正则表达式排除数字模式

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-13 17:31:03

正则表达式排除数字模式

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-13 17:31:03

解决方案1
1 已采纳 2018-08-13 17:31:03