有效地將值從一列替換到另一列 Pandas DataFrame

Question

我有一個像這樣的 Pandas DataFrame：

   col1 col2 col3
1   0.2  0.3  0.3
2   0.2  0.3  0.3
3     0  0.4  0.4
4     0    0  0.3
5     0    0    0
6   0.1  0.4  0.4

只有當col1值等於 0 時，我才想用第二列 ( col2 ) 中的值替換col1值，然后（對於剩余的零值），再次執行此操作，但使用第三列 ( col3 )。 期望的結果是下一個：

   col1 col2 col3
1   0.2  0.3  0.3
2   0.2  0.3  0.3
3   0.4  0.4  0.4
4   0.3    0  0.3
5     0    0    0
6   0.1  0.4  0.4

我使用pd.replace函數完成了它，但它似乎太慢了。我認為必須是一種更快的方法來完成它。

df.col1.replace(0,df.col2,inplace=True)
df.col1.replace(0,df.col3,inplace=True)

有更快的方法嗎？使用其他函數而不是pd.replace函數？

Answer 1

使用np.where更快。 使用與replace類似的模式：

df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])

但是，使用嵌套的np.where稍微快一些：

df['col1'] = np.where(df['col1'] == 0, 
                      np.where(df['col2'] == 0, df['col3'], df['col2']),
                      df['col1'])

計時

使用以下設置生成更大的示例 DataFrame 和計時函數：

df = pd.concat([df]*10**4, ignore_index=True)

def root_nested(df):
    df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])
    return df

def root_split(df):
    df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])
    df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
    return df

def pir2(df):
    df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
    return df

def pir2_2(df):
    slc = (df.values != 0).argmax(axis=1)
    return df.values[np.arange(slc.shape[0]), slc]

def andrew(df):
    df.col1[df.col1 == 0] = df.col2
    df.col1[df.col1 == 0] = df.col3
    return df

def pablo(df):
    df['col1'] = df['col1'].replace(0,df['col2'])
    df['col1'] = df['col1'].replace(0,df['col3'])
    return df

我得到以下時間：

%timeit root_nested(df.copy())
100 loops, best of 3: 2.25 ms per loop

%timeit root_split(df.copy())
100 loops, best of 3: 2.62 ms per loop

%timeit pir2(df.copy())
100 loops, best of 3: 6.25 ms per loop

%timeit pir2_2(df.copy())
1 loop, best of 3: 2.4 ms per loop

%timeit andrew(df.copy())
100 loops, best of 3: 8.55 ms per loop

我嘗試為您的方法計時，但它已經運行了幾分鍾而沒有完成。 作為比較，僅在 6 行示例 DataFrame（不是上面測試的更大的那個）上計時您的方法需要 12.8 毫秒。

Answer 2

我不確定它是否更快，但你是對的，你可以對數據框進行切片以獲得你想要的結果。

df.col1[df.col1 == 0] = df.col2
df.col1[df.col1 == 0] = df.col3
print(df)

輸出：

   col1  col2  col3
0   0.2   0.3   0.3
1   0.2   0.3   0.3
2   0.4   0.4   0.4
3   0.3   0.0   0.3
4   0.0   0.0   0.0
5   0.1   0.4   0.4

或者，如果您希望它更簡潔（盡管我不知道它是否更快），您可以將您所做的與我所做的結合起來。

df.col1[df.col1 == 0] = df.col2.replace(0, df.col3)
print(df)

輸出：

   col1  col2  col3
0   0.2   0.3   0.3
1   0.2   0.3   0.3
2   0.4   0.4   0.4
3   0.3   0.0   0.3
4   0.0   0.0   0.0
5   0.1   0.4   0.4

Answer 3

使用pd.DataFrame.where和pd.DataFrame.bfill方法

df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)
df

使用np.argmax另一種方法

def pir2(df):
    slc = (df.values != 0).argmax(axis=1)
    return df.values[np.arange(slc.shape[0]), slc]

我知道有更好的方法來使用numpy進行切片。 我只是暫時想不出來。

Answer 4

一般來說，有三種方法可以完成這種有條件的替換任務。 他們是：

numpy.where
pandas.Series.mask或pandas.Series.where與Series.mask相反
pandas.DataFrame.loc

你可以試試pandas.Series.mask

df['col1'] = df['col1'].mask(df['col1'].eq(0), df['col2'])
df['col1'] = df['col1'].mask(df['col1'].eq(0), df['col3'])

   col1  col2  col3
1   0.2   0.3   0.3
2   0.2   0.3   0.3
3   0.4   0.4   0.4
4   0.3   0.0   0.3
5   0.0   0.0   0.0
6   0.1   0.4   0.4

或pandas.Series.where

df['col1'] = df['col1'].where(df['col1'].ne(0), df['col2'])
df['col1'] = df['col1'].where(df['col1'].ne(0), df['col3'])

最后，你可以試試loc

df.loc[df['col1'].eq(0), 'col1'] = df['col2']
df.loc[df['col1'].eq(0), 'col1'] = df['col3']

Answer 5

或者，您可以使用combine ：

replace_zeros = lambda x, y: y if x == 0 else x
df['col1'].combine(df['col2'], func=replace_zeros).combine(df['col3'], func=replace_zeros)

輸出：

1    0.2
2    0.2
3    0.4
4    0.3
5    0.0
6    0.1
dtype: float64

有效地將值從一列替換到另一列 Pandas DataFrame

問題描述

5 個解決方案

解決方案1
53 已采納 2016-10-06 19:11:46

解決方案2
10 2016-10-06 19:03:41

解決方案3
3 2016-10-06 19:25:37

解決方案4
0 2022-05-09 19:59:37

解決方案5
0 2022-07-09 18:22:10

有效地將值從一列替換到另一列 Pandas DataFrame

問題描述

5 個解決方案

解決方案1 53 已采納 2016-10-06 19:11:46

解決方案2 10 2016-10-06 19:03:41

解決方案3 3 2016-10-06 19:25:37

解決方案4 0 2022-05-09 19:59:37

解決方案5 0 2022-07-09 18:22:10

解決方案1
53 已采納 2016-10-06 19:11:46

解決方案2
10 2016-10-06 19:03:41

解決方案3
3 2016-10-06 19:25:37

解決方案4
0 2022-05-09 19:59:37

解決方案5
0 2022-07-09 18:22:10