[英]python pandas seperate string column in two by whitespace
我有一个带有以下列“ title”的python pandas dataframe df:
title
This is the first title XY2547
This is the second title WWW48921
This is the third title A2438999
This is another title 123
我需要将此列分为两部分,最后是实际标题和不规则代码。 有没有一种方法可以将其按空格后面的最后一个单词拆分? 请注意,最后一个标题没有代码,并且123是标题的一部分。
最终目标DF
title | cleaned title | code
This is the first title XY2547 This is the first title XY2547
This is the second title WWW48921 This is the second title WWW48921
This is the third title A2438999 This is the third title A2438999
This is another title 123 This is another title 123
我在想类似的东西
df['code'] = df.title.str.extract(r'_\s(\w)', expand=False)
这行不通。
谢谢
尝试这个:
In [62]: df
Out[62]:
title
0 This is the first title XY2547
1 This is the second title WWW48921
2 This is the third title A2438999
3 This is another title 123
In [63]: df[['cleaned_title', 'code']] = \
...: df.title.str.extract(r'(.*?)\s+([A-Z]{1,}\d{3,})?$', expand=True)
In [64]: df
Out[64]:
title cleaned_title code
0 This is the first title XY2547 This is the first title XY2547
1 This is the second title WWW48921 This is the second title WWW48921
2 This is the third title A2438999 This is the third title A2438999
3 This is another title 123 This is another title 123 NaN
#1
str.rsplit
可以在这里使用。 从字符串的右边开始,它分裂n
次。
然后,我们可以join
的结果df
df.join(
df.title.str.rsplit(n=1, expand=True).rename(
columns={0: 'cleaned title', 1: 'code'}
)
)
title cleaned title code
0 This is the first title XY2547 This is the first title XY2547
1 This is the second title WWW48921 This is the second title WWW48921
2 This is the third title A2438999 This is the third title A2438999
3 This is another title 123 This is another title 123
#2
为了避免将123
解释为代码,您必须应用一些未提供的其他逻辑。 @MaxU很客气,可以将他的逻辑嵌入正则表达式中。
我的regex
解决方案如下所示。
计划
'?P<name>'
命名生产的列 '[A-Z0-9]'
'{4,}'
有4个或更多 '^'
到结尾'$'
匹配 '.*'
不贪心'.*?'
regex = '^(?P<cleaned_title>.*?)\s*(?P<code>[A-Z0-9]{4,})?$'
df.join(df.title.str.extract(regex, expand=True))
title cleaned_title code
0 This is the first title XY2547 This is the first title XY2547
1 This is the second title WWW48921 This is the second title WWW48921
2 This is the third title A2438999 This is the third title A2438999
3 This is another title 123 This is another title 123 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.