检查数组值并将结果数组作为列添加到熊猫数据框

Question

我需要将数组作为列添加到 Dataframe：

results['TEST'] = results.apply(lambda x: results_02, axis=1)

结果我得到这样的Dataframe：

ID TEST
1  [1,2,3,4,5,6,7,8,9,10]
2  [1,2,3,4,5,6,7,8,9,10]
3  [1,2,3,4,5,6,7,8,9,10]
4  [1,2,3,4,5,6,7,8,9,10]
5  [1,2,3,4,5,6,7,8,9,10]
6  [1,2,3,4,5,6,7,8,9,10]

但是我想添加条件以检查results['ID'] in results_02是否将除现有值之外的所有值添加到该行，并且我需要为每一行执行此操作。

所以结果数据框需要是这样的：

ID TEST
1  [2,3,4,5,6,7,8,9,10]
2  [1,3,4,5,6,7,8,9,10]
3  [1,2,4,5,6,7,8,9,10]
4  [1,2,3,5,6,7,8,9,10]
5  [1,2,3,4,6,7,8,9,10]
6  [1,2,3,4,5,7,8,9,10]

我认为我可以使用：

results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)

但我收到错误：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

解决此问题的最佳和更优化的方法是什么？

EDIT_1：DF

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]

Answer 1

你可以试试这个：

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results

----------------------------------------------
    ID       TEST
0   250274  [244473, 240274, 247178, 248667]
1   244473  [250274, 240274, 247178, 248667]
2   240274  [250274, 244473, 247178, 248667]
3   247178  [250274, 244473, 240274, 248667]
4   248667  [250274, 244473, 240274, 247178]
----------------------------------------------

如果您的数据框包含多列并且您只对ID列感兴趣，那么您必须通过重塑您的 ID 数组来指定您的掩码。

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]

编辑

我不确定你的评论是什么意思。 我想你想要这样的东西？

import numpy as np
import pandas as pd

data = {
    'ID1': [250274, 244473, 240274, 247178, 248667],
    'ID2': [244473, 240274, 247178, 248667, 250274],
}



results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]

------------------------------------------------
    ID1     ID2     TEST
0   250274  244473  [240274, 247178, 248667]
1   244473  240274  [250274, 247178, 248667]
2   240274  247178  [250274, 244473, 248667]
3   247178  248667  [250274, 244473, 240274]
4   248667  250274  [244473, 240274, 247178]
------------------------------------------------

如果不是，请让您的评论更准确。

Answer 2

我使用了那个解决方案：

results['RESULTS'] = results['ID'].apply(lambda x: [i for i in result_02 if x!=i])

检查数组值并将结果数组作为列添加到熊猫数据框

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-06-15 11:49:25

编辑

解决方案2
0 2022-06-20 06:55:50

检查数组值并将结果数组作为列添加到熊猫数据框

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-06-15 11:49:25

编辑

解决方案2 0 2022-06-20 06:55:50

解决方案1
1 已采纳 2022-06-15 11:49:25

解决方案2
0 2022-06-20 06:55:50