使用正则表达式提取python字符串中的子字符串

Question

I have a pandas column like this: 我有一个这样的熊猫专栏：

LOD-NY-EP-ADM
LOD-NY-EC-RUL
LOD-NY-EC-WFL
LOD-NY-LSM-SER
LOD-NY-PM-MOB
LOD-NY-PM-MOB
LOD-NY-RMK
LOD-NY-EC-TIM

I want the output in new column as 我希望新列中的输出为

EP
EC
EC
LSM
PM
PM
RMK
EC

I tried this: 我尝试了这个：

pattern=df.column[0:10].str.extract(r"\w*-NY-(.*?)-\w*",expand=False)

While it works for everything but it fails to get RMK out and gives NaN since there is nothing after that and it looks for -\\w zero or more times. 尽管它适用于所有内容，但无法得到RMK并给出NaN，因为此后没有任何内容，并且它查找-\\ w零次或多次。 But then that should work if there is nothing after RMK. 但是，如果在RMK之后什么也没有，那应该可以工作。

Any idea whats going wrong? 知道发生了什么事吗？

We can just use a array of these and use regular expression if pandas syntax is not familiar. 如果熊猫的语法不熟悉，我们可以只使用它们的数组并使用正则表达式。

Answer 1

Could you just use regular python? 您可以只使用常规的python吗？ Let df be your dataframe, and row be the name of your row. 假设df是您的数据框，而row是您的行的名称。

series = df.row
new_list =  [i.split('-')[2] for i in series]
new_series = pd.Series(new_list)

Answer 2

pattern=df.column[0:10].str.extract(r"\w*-NY-(\w+)",expand=False)

See https://regex101.com/r/3uDpam/3 参见https://regex101.com/r/3uDpam/3

Your regex meant matching strings must have 3 - characters. 您正则表达式的意思字符串匹配必须有3 -字符。 I changed it so last -XX could occur 0 or 1 times. 我更改了它，所以最后-XX可能发生0或1次。

UPDATE: Changed so 2nd group is non-capturing (added ?: ) 更新：已更改，因此第二组不被捕获（已添加?: ：）

UPDATE: Thanks to Casimir, removed useless group at end of pattern 更新：感谢卡西米尔，在模式结束时删除了无用的组

使用正则表达式提取python字符串中的子字符串

问题描述

2 个解决方案

解决方案1
1 2018-04-09 22:45:34

解决方案2
1 2018-04-09 22:55:08

使用正则表达式提取python字符串中的子字符串

问题描述

2 个解决方案

解决方案1 1 2018-04-09 22:45:34

解决方案2 1 2018-04-09 22:55:08

解决方案1
1 2018-04-09 22:45:34

解决方案2
1 2018-04-09 22:55:08