简体   繁体   中英

how to do string operation on pandas dataframe

I have a dataframe as below:

df = pd.DataFrame({'a': [10, 11, None],
                   'b': ['apple;', None, 'orange;'],
                   'c': ['red', 'blue', 'green']})

I'm trying to strip the ';' of those strings. I tried

df.select_dtypes(include=['object']).applymap(lambda x: x.strip(';'))

I got error message:

AttributeError: ("'NoneType' object has no attribute 'strip'", 'occurred at   index b') 

Seems like the None gave me some trouble. Help is greatly appreciated. Thanks a lot.

The problem is that some of the values are None , and you can't Non.strip() .

df.select_dtypes(include=['object'])
         b      c
0   apple;    red
1     None   blue
2  orange;  green

What you can do is strip only if the object is not None, otherwise just return the object:

df.select_dtypes(include=['object']).applymap(lambda x: x.strip(';') if x else x)
        b      c
0   apple    red
1    None   blue
2  orange  green

You can use try and except in this case.

>>> def am(o):
...    try:
...       return o.strip(';')
...    except AttributeError:
...       return o

Then applymap as you have tried:

>>> df.select_dtypes(include=['object']).applymap(am)
        b      c
0   apple    red
1    None   blue
2  orange  green

Use the Series str attribute and apply instead of applymap :

In [17]: df.select_dtypes(include=['object']).apply(lambda S:S.str.strip(';'))
Out[17]: 
        b      c
0   apple    red
1    None   blue
2  orange  green

In [18]: 

A different approach is to iterate through all the columns that are dtype object and use the Series function strip that handles NaN values:

for col in df.columns[df.dtypes == object]:
    df[col] = df[col].str.strip(";")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM