[英]Create a column based on if a string is a substring in pandas Dataframe
我的數據框中的一列是具有特定命名約定的標識符名稱。 輸入時,輸入不正確。 我想問一下如何在python中找到要在其自己的列中輸入的特定關鍵字。 也許某種循環?
例子:
types = ['XYZ', 'OPQ', 'MNO', 'ABC']
當前 df:
ID ID Name
45 I_name_ls_XYZ_random
46 I_22_name_ABC_random
47 I_name_ls_XYZ_random_45
48 I_name_ls_MNO_random
49 I_ls_OPQ_random_name
50 I_name_ls_ABC_random
51 I_name_ls_XYZ_random
52 I_name_MNO_random
想要的結果:
ID ID Name types
45 I_name_ls_XYZ_random XYZ
46 I_22_name_ABC_random ABC
47 I_name_ls_XYZ_random_45 XYZ
48 I_name_ls_MNO_random MNO
49 I_ls_OPQ_random_name OPQ
50 I_name_ls_ABC_random ABC
51 I_name_ls_XYZ_random XYZ
52 I_name_MNO_random MNO
使用str.extract
df['types'] = df.Name.str.extract('({})'.format('|'.join(types)))
ID Name types
0 45 I_name_ls_XYZ_random XYZ
1 46 I_22_name_ABC_random ABC
2 47 I_name_ls_XYZ_random_45 XYZ
3 48 I_name_ls_MNO_random MNO
4 49 I_ls_OPQ_random_name OPQ
5 50 I_name_ls_ABC_random ABC
6 51 I_name_ls_XYZ_random XYZ
7 52 I_name_MNO_random MNO
如果您可能需要多個匹配項,可以使用findall
df
ID Name
0 45 I_name_ls_XYZ_ABCrandom
df.Name.str.findall(r'|'.join(types))
0 [XYZ, ABC]
Name: Name, dtype: object
將pd.Series.apply
與自定義函數/生成器表達式一起使用:
types = {'XYZ', 'OPQ', 'MNO', 'ABC'}
def string_filter(x):
return next((i for i in x.split('_') if i in types), None)
df['types'] = df['ID_Name'].apply(string_filter)
print(df)
ID ID_Name types
0 45 I_name_ls_XYZ_random XYZ
1 46 I_22_name_ABC_random ABC
2 47 I_name_ls_XYZ_random_45 XYZ
3 48 I_name_ls_MNO_random MNO
4 49 I_ls_OPQ_random_name OPQ
5 50 I_name_ls_ABC_random ABC
6 51 I_name_ls_XYZ_random XYZ
7 52 I_name_MNO_random MNO
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.