简体   繁体   中英

How to replace a string in a list if it contains a substring in Pandas DataFrame column

I have a df:

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})


df

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory disease, cough]
2       23      male    [fever]
3       33      male    [respiratory distress]

I am trying to replace all instances of values in the 'symptom' column (which are lists in this case) that contain the substring "respiratory", and change the entire value in that list to "acute respiratory distress" so it is uniform through out the data frame. This is the desired outcome:

Output:

       age    gender    symptoms
0       31      male    [acute respiratory distress, fever]
1       62      female  [acute respiratory distress, cough]
2       23      male    [fever]
3       33      male    [acute respiratory distress]

I have tried:

df.loc[df['symptoms'].str.contains('respiratory', na=False), 'symptoms'] = 'acute respiratory 
distress'

print(df)

The data frame remains as it was however.

Like this:

import pandas as pd

df = pd.DataFrame({'age': [13,62,53, 33],
                   'gender': ['male','female','male', 'male'],
                   'symptoms': [['acute respiratory distress', 'fever'],
                                ['acute respiratory disease', 'cough'],
                                ['fever'],
                                ['respiratory distress']]})

df['symptoms'] = [['acute respiratory disease' if 'respiratory' in s else s for s in lst] for lst in df['symptoms']]
       
print(df)

Output:

   age  gender                            symptoms
0   13    male  [acute respiratory disease, fever]
1   62  female  [acute respiratory disease, cough]
2   53    male                             [fever]
3   33    male         [acute respiratory disease]

Join the explode , then use contains assign

>>> s = df.symptoms.explode()
>>> df['symptoms'] = s.mask(s.str.contains('respiratory'),'acute respiratory distress').groupby(level=0).agg(list)
>>> df
   age  gender                             symptoms
0   13    male  [acute respiratory distress, fever]
1   62  female  [acute respiratory distress, cough]
2   53    male                              [fever]
3   33    male         [acute respiratory distress]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM