[英]How to match regex pattern and replace it with a matching group using Pandas?
I have the following Pandas Series: 我有以下熊猫系列:
SC_S193_M7.CONTROLDAY10.EPI.P1_Stem
SC_S194_M7.CONTROLDAY10.EPI.P1_Goblet
SC_S102_M1.CONTROLDAY3.EPI2_Enterocyte
SC_S106_M1.CONTROLDAY3.EPI2_Goblet
I want to use regex to extract the string after the last underscore in each row of this series. 我想使用正则表达式来提取该系列每一行中最后一个下划线之后的字符串。 I was able to come up with regex that match with the last string but note sure how to implement it in a pandas series method. 我能够提出与最后一个字符串匹配的正则表达式,但是请注意如何在pandas系列方法中实现它。
The regex I used to match the pattern and replace with the first matching group \\1
: 我用来匹配模式并用第一个匹配组\\1
替换的正则表达式:
SC_S\\d{3}_M\\d\\.CONTROLDAY\\d{1,2}\\.EPI\\d?(?:\\.P\\d_|_)
I tried using .replace() as follows but that did not work out: 我尝试如下使用.replace(),但没有成功:
.replace('SC_S\\d{3}_M\\d\\.CONTROLDAY\\d{1,2}\\.EPI\\d?(?:\\.P\\d_|_)(\\w+)')
Any idea how to use Pandas series method to extract the last string before the underscore or find the matching pattern and replace it with the first group? 知道如何使用Pandas系列方法提取下划线前的最后一个字符串或找到匹配的模式并将其替换为第一组吗?
I think you can split it instead of using RegEx: 我认为您可以拆分它而不是使用RegEx:
In [170]: s
Out[170]:
0 SC_S193_M7.CONTROLDAY10.EPI.P1_Stem
1 SC_S194_M7.CONTROLDAY10.EPI.P1_Goblet
2 SC_S102_M1.CONTROLDAY3.EPI2_Enterocyte
3 SC_S106_M1.CONTROLDAY3.EPI2_Goblet
Name: 0, dtype: object
In [171]: s.str.split('_').str[-1]
Out[171]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
or better using rsplit(..., n=1)
: 或更好地使用rsplit(..., n=1)
:
In [174]: s.str.rsplit('_', n=1).str[-1]
Out[174]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
alternatively you can use .str.extract()
: 或者,您可以使用.str.extract()
:
In [177]: s.str.extract(r'.*_([^_]*)$', expand=False)
Out[177]:
0 Stem
1 Goblet
2 Enterocyte
3 Goblet
Name: 0, dtype: object
应该起作用的另一种变体(假设s
是您的系列)类似于
s.apply(lambda r : re.sub('.*_([^_]*)$', '\\1', r))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.