繁体   English   中英

检查数组值并将结果数组作为列添加到熊猫数据框

[英]Check array values and add resulted array as column to pandas dataframe

我需要将数组作为列添加到 Dataframe:

results['TEST'] = results.apply(lambda x: results_02, axis=1)

结果我得到这样的Dataframe:

ID TEST
1  [1,2,3,4,5,6,7,8,9,10]
2  [1,2,3,4,5,6,7,8,9,10]
3  [1,2,3,4,5,6,7,8,9,10]
4  [1,2,3,4,5,6,7,8,9,10]
5  [1,2,3,4,5,6,7,8,9,10]
6  [1,2,3,4,5,6,7,8,9,10]

但是我想添加条件以检查results['ID'] in results_02是否将除现有值之外的所有值添加到该行,并且我需要为每一行执行此操作。

所以结果数据框需要是这样的:

ID TEST
1  [2,3,4,5,6,7,8,9,10]
2  [1,3,4,5,6,7,8,9,10]
3  [1,2,4,5,6,7,8,9,10]
4  [1,2,3,5,6,7,8,9,10]
5  [1,2,3,4,6,7,8,9,10]
6  [1,2,3,4,5,7,8,9,10]

我认为我可以使用:

results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)

但我收到错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

解决此问题的最佳和更优化的方法是什么?

EDIT_1:DF

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]

你可以试试这个:

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results

----------------------------------------------
    ID       TEST
0   250274  [244473, 240274, 247178, 248667]
1   244473  [250274, 240274, 247178, 248667]
2   240274  [250274, 244473, 247178, 248667]
3   247178  [250274, 244473, 240274, 248667]
4   248667  [250274, 244473, 240274, 247178]
----------------------------------------------

如果您的数据框包含多列并且您只对ID列感兴趣,那么您必须通过重塑您的 ID 数组来指定您的掩码。

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]

编辑

我不确定你的评论是什么意思。 我想你想要这样的东西?

import numpy as np
import pandas as pd

data = {
    'ID1': [250274, 244473, 240274, 247178, 248667],
    'ID2': [244473, 240274, 247178, 248667, 250274],
}



results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]

------------------------------------------------
    ID1     ID2     TEST
0   250274  244473  [240274, 247178, 248667]
1   244473  240274  [250274, 247178, 248667]
2   240274  247178  [250274, 244473, 248667]
3   247178  248667  [250274, 244473, 240274]
4   248667  250274  [244473, 240274, 247178]
------------------------------------------------

如果不是,请让您的评论更准确。

我使用了那个解决方案:

results['RESULTS'] = results['ID'].apply(lambda x: [i for i in result_02 if x!=i])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM