简体   繁体   中英

Most Pythonic way to remove special characters from rows in a column in Pandas

When I call df.head() on my Pandas dataframe, I get the following:

0                                          New YorkÊ
1                                       Los AngelesÊ
2                                           ChicagoÊ
3                                            LondonÊ
4                                           HoustonÊ
Name: cities, dtype: object

As you can see, there is an extra character of some sort at the end of the cities column. So, I remove this character with the following code:

df['cities'] = df['cities'].str.replace('Ê', '')

This works. But, is this the best (most Pythonic) way to remove this character?

Thanks!

Nothing's wrong with your solution per se, but you might be better off applying an overall solution for all non-ascii characters

>>> df['cities'] = df['cities'].str.encode('ascii', 'ignore').str.decode('ascii')

Suppose a city name includes that character? A safer method is

df['cities'] = df['cities'].str.rstrip('Ê')

although it may still be messy if you have a capitalized city name ending in that character. But the risk is reduced.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM