Most Pythonic way to remove special characters from rows in a column in Pandas

Question

When I call df.head() on my Pandas dataframe, I get the following:

0                                          New YorkÊ
1                                       Los AngelesÊ
2                                           ChicagoÊ
3                                            LondonÊ
4                                           HoustonÊ
Name: cities, dtype: object

As you can see, there is an extra character of some sort at the end of the cities column. So, I remove this character with the following code:

df['cities'] = df['cities'].str.replace('Ê', '')

This works. But, is this the best (most Pythonic) way to remove this character?

Thanks!

Answer 1

Nothing's wrong with your solution per se, but you might be better off applying an overall solution for all non-ascii characters

>>> df['cities'] = df['cities'].str.encode('ascii', 'ignore').str.decode('ascii')

Answer 2

Suppose a city name includes that character? A safer method is

df['cities'] = df['cities'].str.rstrip('Ê')

although it may still be messy if you have a capitalized city name ending in that character. But the risk is reduced.

Most Pythonic way to remove special characters from rows in a column in Pandas

Question

2 answers

solution1
2 ACCPTED 2021-01-02 22:42:47

solution2
0 2021-01-02 22:47:42

Most Pythonic way to remove special characters from rows in a column in Pandas

Question

2 answers

solution1 2 ACCPTED 2021-01-02 22:42:47

solution2 0 2021-01-02 22:47:42

solution1
2 ACCPTED 2021-01-02 22:42:47

solution2
0 2021-01-02 22:47:42