使用 apply lambda function 根據另一列的條件創建一個新列

Question

我有以下df：

operator_id	總記錄	avg_wait_time	is_missed_call	out_calls_cnt
0	879896.0	117	17.958253	47
1	879898.0	227	17.239858	89
2	880020.0	20	6.815000	6
3	880022.0	70	16.172996	29

我嘗試創建一個名為“test”的新列，它將顯示 out_calls_cnt 占 total_records 的百分比，使用條件 out_calls_cnt 大於 1，否則 function 應返回 0。

我假設循環使用一行 function 效率低下。

我的代碼：

dataset_operators['test'] = dataset_operators[['out_calls_cnt', 'total_records']].apply(lambda x:  dataset_operators['out_calls_cnt'] / dataset_operators['total_rows'] if dataset_operators['out_calls_cnt'] > 10 else 0, axis = 1)

得到錯誤：ValueError：一個系列的真值是不明確的。 使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

我想嘗試使用 lambda 來解決它，即使我設法使用 where 來解決它：

dataset_operators['test'] = (dataset_operators['out_calls_cnt'] / dataset_operators['total_records']).where(dataset_operators['out_calls_cnt'] > 10, 0)

Answer 1

這是使用np.where的替代方法，使用您顯示的示例，請嘗試以下操作。 這將在 df 中創建一個名為test的新列，您也可以根據需要更改它。

import numpy as np
import pandas as pd
df['test'] = np.where(df['out_calls_cnt']>10,df['out_calls_cnt'] / df['total_records'],0)

Answer 2

我建議不要使用 apply 並使用您的第二個解決方案where但由於您特別要求它，您可以執行以下操作，將您的dataset_operators調用中的 dataset_operators 替換為x

df.apply(lambda x: x['out_calls_cnt'] / x['total_records'] 
                   if x['out_calls_cnt'] > 10 else 0, axis = 1)

使用 apply lambda function 根據另一列的條件創建一個新列

問題描述

2 個解決方案

解決方案1
4 2021-04-18 11:56:25

解決方案2
3 2021-04-18 11:48:36

使用 apply lambda function 根據另一列的條件創建一個新列

問題描述

2 個解決方案

解決方案1 4 2021-04-18 11:56:25

解決方案2 3 2021-04-18 11:48:36

解決方案1
4 2021-04-18 11:56:25

解決方案2
3 2021-04-18 11:48:36