简体   繁体   English

如何检查pandas列中的所有子字符串是否相同?

[英]How to check if all substring in pandas column are the same?

I have this column and I want to check if all strings have anr12 substring. 我有此 ,我想检查是否所有字符串都具有anr12子字符串。 How to check this? 如何检查? And if all substrings are the same, how to drop this particular substring? 并且如果所有子字符串都相同,如何删除该特定子字符串?

I think you want check by contains with all for check all True s and then str.replace : 我认为您想通过all contains检查所有True的检查,然后str.replace

df = pd.DataFrame({'A':['123anr12', '345anr12']})
print (df)
          A
0  123anr12
1  345anr12

if df['A'].str.contains('anr12').all():
    df['A'] = df['A'].str.replace('anr12','')
print (df)

     A
0  123
1  345

EDIT1: You can use dictionary for lookup: EDIT1:您可以使用dictionary进行查找:

train_df = pd.DataFrame({'477':['123nbf12', '34nbf12'], 
                         '479':['tt1', '32'], 
                         '482':['anr1234', '345anr12a12']})

obj_features = ['477', '479', '482'] #it's column names 
substring = ['nbf', 'tt1', 'anr12'] # get rid of 'nbf', 'tt1', 'anr12' substrings 
d = dict(zip(obj_features, substring))
print (d)
{'477': 'nbf', '479': 'tt1', '482': 'anr12'}

for k, v in d.items():
    if train_df[k].str.contains(v).all(): 
        train_df[k] = train_df[k].str.replace(v,'')         
print (train_df)
     477  479     482
0  12312  tt1      34
1   3412   32  345a12

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM