[英]Create a column based on if a string is a substring in pandas Dataframe
我的数据框中的一列是具有特定命名约定的标识符名称。 输入时,输入不正确。 我想问一下如何在python中找到要在其自己的列中输入的特定关键字。 也许某种循环?
例子:
types = ['XYZ', 'OPQ', 'MNO', 'ABC']
当前 df:
ID ID Name
45 I_name_ls_XYZ_random
46 I_22_name_ABC_random
47 I_name_ls_XYZ_random_45
48 I_name_ls_MNO_random
49 I_ls_OPQ_random_name
50 I_name_ls_ABC_random
51 I_name_ls_XYZ_random
52 I_name_MNO_random
想要的结果:
ID ID Name types
45 I_name_ls_XYZ_random XYZ
46 I_22_name_ABC_random ABC
47 I_name_ls_XYZ_random_45 XYZ
48 I_name_ls_MNO_random MNO
49 I_ls_OPQ_random_name OPQ
50 I_name_ls_ABC_random ABC
51 I_name_ls_XYZ_random XYZ
52 I_name_MNO_random MNO
使用str.extract
df['types'] = df.Name.str.extract('({})'.format('|'.join(types)))
ID Name types
0 45 I_name_ls_XYZ_random XYZ
1 46 I_22_name_ABC_random ABC
2 47 I_name_ls_XYZ_random_45 XYZ
3 48 I_name_ls_MNO_random MNO
4 49 I_ls_OPQ_random_name OPQ
5 50 I_name_ls_ABC_random ABC
6 51 I_name_ls_XYZ_random XYZ
7 52 I_name_MNO_random MNO
如果您可能需要多个匹配项,可以使用findall
df
ID Name
0 45 I_name_ls_XYZ_ABCrandom
df.Name.str.findall(r'|'.join(types))
0 [XYZ, ABC]
Name: Name, dtype: object
将pd.Series.apply
与自定义函数/生成器表达式一起使用:
types = {'XYZ', 'OPQ', 'MNO', 'ABC'}
def string_filter(x):
return next((i for i in x.split('_') if i in types), None)
df['types'] = df['ID_Name'].apply(string_filter)
print(df)
ID ID_Name types
0 45 I_name_ls_XYZ_random XYZ
1 46 I_22_name_ABC_random ABC
2 47 I_name_ls_XYZ_random_45 XYZ
3 48 I_name_ls_MNO_random MNO
4 49 I_ls_OPQ_random_name OPQ
5 50 I_name_ls_ABC_random ABC
6 51 I_name_ls_XYZ_random XYZ
7 52 I_name_MNO_random MNO
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.