After struggling with a csv file encoding I decided to do the encoding heresy of manually replacing some characters.
This is how the dataframe looks:
df = pd.DataFrame({'a' : 'bÉd encoded',
'b' : ['foo', 'bar'] * 3,
'c' : 'bÉd encoded too'})
a b c
0 bÉd encoded foo bÉd encoded too
1 bÉd encoded bar bÉd encoded too
2 bÉd encoded foo bÉd encoded too
3 bÉd encoded bar bÉd encoded too
4 bÉd encoded foo bÉd encoded too
5 bÉd encoded bar bÉd encoded too
If my only problem was column 'a' this function would be enough:
def force_good_e(row):
col = row['a']
if 'É' in col:
col = col.replace('É','a')
return col
df['a'] = df.apply(force_good_e, axis=1)
But then I would need another function for column 'c'
I got an improvement with this:
def force_good_es(row, column):
col = row[column]
if 'É' in col:
col = col.replace('É','a')
return col
df['a'] = df.apply(lambda x: force_good_es(x,'a'), axis=1)
df['c'] = df.apply(lambda x: force_good_es(x,'c'), axis=1)
But it got me wondering, is there a better way to do this?
ie eliminating the need to make one line of
df[n] = df.apply(lambda x: force_good_es(x,n), axis=1)
for each n column that needs to be fixed.
You could use str.replace
df['a'] = df['a'].str.replace('É','a')
df['c'] = df['c'].str.replace('É','a')
or like @wen mentioned in comments.
df = df.replace({'É':'a'},regex=True)
In case that character occurs in all columns but you want to replace it only in selected columns, and you want use apply
:
df.iloc[:,[0,2]].apply(lambda x: x.str.replace('É','a'), axis=1)
Occurrence of É
in first and third columns will be replaced by a
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.