I have the following DataFrame
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
Borough 20 non-null object
Indian 20 non-null object
Pakistani 20 non-null object
Bangladeshi 20 non-null object
Chinese 20 non-null object
Other_Asian 20 non-null object
Total_Asian 20 non-null object
dtypes: object(7)
Only 'Borough' column is string and others should be int or float. I am trying to convert using astype(int). I have tried all the options mentioned on the internet but still getting error.
df_LondonEthnicity['Indian'] = df_LondonEthnicity['Indian'].astype(int)
Errors is :
invalid literal for int() with base 10:
I also tried
df_LondonEthnicity['Indian'] = df_LondonEthnicity.astype({'Indian': int}).dtypes
I also tried
cols = ['Indian', 'Pakistani', 'Bangladeshi', 'Chinese', 'Other_Asian', 'Total_Asian']
for col in cols: # Iterate over chosen columns
df_LondonEthnicity[col] = pd.to_numeric(df_LondonEthnicity[col])
Also tried converting got string and then to float
I'd appreciate some help on this. Thanks
As pointed out in the comments, you need to use the to_numeric
function.
What the error means is that value you are trying to convert contains characters other than 0-9
(base10).
So the options that you have is either use pd.to_numeric
and have all the non-conforming values to be NaN
or to convert it somehow.
So say you have a dataframe like this.
>>> df
X
0 123
1 123,
2 200
3 200.1
Using pd.to_numeric
will such an output. But the values are floats.
>>> pd.to_numeric(df.X, errors='coerce')
0 123.0
1 NaN
2 200.0
3 200.1
Name: X, dtype: float64
Other option is to convert it somehow like this.
>>> df.X.str.extract(r'([\d]+)').astype(int)
0
0 123
1 123
2 200
3 200
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.