簡體   English   中英

替換熊貓數據框列中的特定值,否則將列轉換為數字

[英]Replace specific value in pandas dataframe column, else convert column to numeric

鑒於以下熊貓數據框

+----+------------------+-------------------------------------+--------------------------------+
|    |   AgeAt_X        |   AgeAt_Y                           |   AgeAt_Z                      |
|----+------------------+-------------------------------------+--------------------------------+
|  0 |   Older than 100 |                      Older than 100 |                          74.13 |
|  1 |              nan |                                 nan |                          58.46 |
|  2 |              nan |                                 8.4 |                          54.15 |
|  3 |              nan |                                 nan |                          57.04 |
|  4 |              nan |                               57.04 |                            nan |
+----+------------------+-------------------------------------+--------------------------------+

如何用nan替換Older than 100特定列中的值

+----+------------------+-------------------------------------+--------------------------------+
|    |   AgeAt_X        |   AgeAt_Y                           |   AgeAt_Z                      |
|----+------------------+-------------------------------------+--------------------------------+
|  0 |              nan |                                 nan |                          74.13 |
|  1 |              nan |                                 nan |                          58.46 |
|  2 |              nan |                                 8.4 |                          54.15 |
|  3 |              nan |                                 nan |                          57.04 |
|  4 |              nan |                               57.04 |                            nan |
+----+------------------+-------------------------------------+--------------------------------+

筆記

  • 從所需的列中刪除Older than 100字符串后,我將這些列轉換為數字,以便對所述列執行計算。
  • 此數據框中還有其他列(我已從本示例中排除),它們不會轉換為數字,因此必須一次完成一列轉換為數字。

我試過的

嘗試 1

if df.isin('Older than 100'):
    df.loc[df['AgeAt_X']] = ''
else:
    df['AgeAt_X'] = pd.to_numeric(df["AgeAt_X"])

嘗試 2

if df.loc[df['AgeAt_X']] == 'Older than 100r':
    df.loc[df['AgeAt_X']] = ''
elif df.loc[df['AgeAt_X']] == '':
    df['AgeAt_X'] = pd.to_numeric(df["AgeAt_X"])

嘗試 3

df['AgeAt_X'] = ['' if ele == 'Older than 100' else df.loc[df['AgeAt_X']] for ele in df['AgeAt_X']]

嘗試 1、2 和 3 返回以下錯誤:

KeyError: 'None of [0 NaN\\n1 NaN\\n2 NaN\\n3 NaN\\n4 NaN\\n5 NaN\\n6 NaN\\n7 NaN\\n8 NaN\\n9 NaN\\n10 NaN\\n11 NaN\\n12 NaN\\n13 NaN\\n14 NaN\\n15 NaN\\n16 NaN\\n17 NaN\\n18 NaN\\n19 NaN\\n20 NaN\\n21 NaN\\n22 NaN\\n23 NaN\\n24 NaN\\n25 NaN\\n26 NaN\\n27 NaN\\n28 NaN\\n29 NaN\\n ..\\n6332 NaN\\n6333 NaN\\n6334 NaN\\n6335 NaN\\n6336 NaN\\n6337 NaN\\n6338 NaN\\n6339 NaN\\n6340 NaN\\n6341 NaN\\n6342 NaN\\n6343 NaN\\n6344 NaN\\n6345 NaN\\n6346 NaN\\n6347 NaN\\n6348 NaN\\n6349 NaN\\n6350 NaN\\n6351 NaN\\n6352 NaN\\n6353 NaN\\n6354 NaN\\n6355 NaN\\n6356 NaN\\n6357 NaN\\n6358 NaN\\n6359 NaN\\n6360 NaN\\n6361 NaN\\nName: AgeAt_X, Length: 6362, dtype: float64] are in the [index]'

嘗試 4

df['AgeAt_X'] = df['AgeAt_X'].replace({'Older than 100': ''})

嘗試 4 返回以下錯誤:

TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str'

我也看了幾個帖子。 下面的兩個實際上並沒有替換該值而是創建一個從其他人派生的新列

替換 Pandas DataFrame 中的特定值

Pandas 替換 DataFrame 值

我們可以遍歷每一列並檢查句子是否存在。 如果我們得到一擊,我們與替換句子NaNSeries.str.replace並將其轉換為數字與后權Series.astype ,在這種情況下float

df.dtypes
AgeAt_X     object
AgeAt_Y     object
AgeAt_Z    float64
dtype: object

sent = 'Older than 100'

for col in df.columns:
    if sent in df[col].values:
        df[col] = df[col].str.replace(sent, 'NaN')
        df[col] = df[col].astype(float)

print(df)
   AgeAt_X  AgeAt_Y  AgeAt_Z
0      NaN      NaN    74.13
1      NaN      NaN    58.46
2      NaN     8.40    54.15
3      NaN      NaN    57.04
4      NaN    57.04      NaN

df.dtypes
AgeAt_X    float64
AgeAt_Y    float64
AgeAt_Z    float64
dtype: object

如果我理解正確,您可以通過一次調用DataFrame.replacenp.nan替換所有出現的Older than 100 如果所有剩余的值都是數字,則替換將隱式地將列的數據類型更改為數字

# Minimal example DataFrame
df = pd.DataFrame({'AgeAt_X': ['Older than 100', np.nan, np.nan],
                   'AgeAt_Y': ['Older than 100', np.nan, 8.4],
                   'AgeAt_Z': [74.13, 58.46, 54.15]})
df
          AgeAt_X         AgeAt_Y  AgeAt_Z
0  Older than 100  Older than 100    74.13
1             NaN             NaN    58.46
2             NaN             8.4    54.15

df.dtypes
AgeAt_X     object
AgeAt_Y     object
AgeAt_Z    float64
dtype: object

# Replace occurrences of 'Older than 100' with np.nan in any column
df.replace('Older than 100', np.nan, inplace=True)

df
   AgeAt_X  AgeAt_Y  AgeAt_Z
0      NaN      NaN    74.13
1      NaN      NaN    58.46
2      NaN      8.4    54.15

df.dtypes
AgeAt_X    float64
AgeAt_Y    float64
AgeAt_Z    float64
dtype: object

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM