DataFrame 對象類型列到 int 或 float 錯誤

Question

我有以下數據幀

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 7 columns):
Borough        20 non-null object
Indian         20 non-null object
Pakistani      20 non-null object
Bangladeshi    20 non-null object
Chinese        20 non-null object
Other_Asian    20 non-null object
Total_Asian    20 non-null object
dtypes: object(7)

只有 'Borough' 列是字符串，其他列應該是 int 或 float。 我正在嘗試使用 astype(int) 進行轉換。 我已經嘗試了互聯網上提到的所有選項，但仍然出現錯誤。

df_LondonEthnicity['Indian'] = df_LondonEthnicity['Indian'].astype(int)

錯誤是：

基數為 10 的 int() 的無效文字：

我也試過

df_LondonEthnicity['Indian'] = df_LondonEthnicity.astype({'Indian': int}).dtypes

我也試過

cols = ['Indian', 'Pakistani', 'Bangladeshi', 'Chinese', 'Other_Asian', 'Total_Asian']  

for col in cols:  # Iterate over chosen columns
  df_LondonEthnicity[col] = pd.to_numeric(df_LondonEthnicity[col])

還嘗試轉換得到字符串然后浮動

我很感激這方面的一些幫助。 謝謝

Answer 1

正如評論中所指出的，您需要使用to_numeric函數。

錯誤意味着您嘗試轉換的值包含0-9 (base10) 以外的字符。

因此，您擁有的選項是使用pd.to_numeric並將所有不符合標准的值設為NaN或以某種方式將其轉換。

所以說你有一個這樣的數據框。

使用pd.to_numeric會產生這樣的輸出。 但這些值是浮點數。

>>> pd.to_numeric(df.X, errors='coerce')
0    123.0
1      NaN
2    200.0
3    200.1
Name: X, dtype: float64

其他選擇是像這樣以某種方式轉換它。

>>> df.X.str.extract(r'([\d]+)').astype(int)
     0
0  123
1  123
2  200
3  200

DataFrame 對象類型列到 int 或 float 錯誤

問題描述

1 個解決方案

解決方案1
0 2019-12-20 17:53:54

DataFrame 對象類型列到 int 或 float 錯誤

問題描述

1 個解決方案

解決方案1 0 2019-12-20 17:53:54

解決方案1
0 2019-12-20 17:53:54