[英]Join two strings and get rid of list regex
我有一个 dataframe 在 B 列中包含 2 部分字符串,用 A 列的正则表达式提取:
df['B'] = df['A'].str.findall(r'([S][\d]|[V][\d]{3})')
A B
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 ['S1', 'V087']
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 ['S1', 'V023']
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 ['S1', 'V155']
我想去掉 B 列中的列表,并用'_'
连接两个字符串
结果如下所示:
A B
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 S1_V087
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 S1_V023
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 S1_V155
我想用正则表达式从 A 列中提取的另一件事是字符串的这一部分,如下所示:
I have no idea how the regex would look!
A C
1 R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593 S1_1785984
2 R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105 S1_5896589
3 R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033 S1_2541236
对不起,双重问题,我会感谢你的帮助!
使用: str.join("_")
前任:
df['B'] = df['B'].str.join("_")
print(df['B'])
Output:
0 S1_V087
1 S1_V023
2 S1_V155
Name: B, dtype: object
使用正则表达式提取内容
df['C'] = "S1_" + df['A'].str.extract("(\d+)_\d+$")
print(df['C'])
Output:
0 S1_1785984
1 S1_5896589
2 S1_2541236
Name: C, dtype: object
第一个你只需要申请'_'。加入B:
df['B'] = df['B'].apply('_'.join)
其次,您不需要正则表达式,只需用“_”拆分并获取所需的值,然后再次加入:
df['C'] = df['A'].apply(lambda x: '_'.join([x.split('_')[4], x.split('_')[-2]]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.