简体   繁体   English

检查数组值并将结果数组作为列添加到熊猫数据框

[英]Check array values and add resulted array as column to pandas dataframe

I need to add array as column to Dataframe:我需要将数组作为列添加到 Dataframe:

results['TEST'] = results.apply(lambda x: results_02, axis=1)

As result I'm getting Dataframe like this:结果我得到这样的Dataframe:

ID TEST
1  [1,2,3,4,5,6,7,8,9,10]
2  [1,2,3,4,5,6,7,8,9,10]
3  [1,2,3,4,5,6,7,8,9,10]
4  [1,2,3,4,5,6,7,8,9,10]
5  [1,2,3,4,5,6,7,8,9,10]
6  [1,2,3,4,5,6,7,8,9,10]

But I want to add condition to check if results['ID'] in results_02 , add all values except existing to this row, and this I need to do for every row.但是我想添加条件以检查results['ID'] in results_02是否将除现有值之外的所有值添加到该行,并且我需要为每一行执行此操作。

So result Dataframe need to be like this:所以结果数据框需要是这样的:

ID TEST
1  [2,3,4,5,6,7,8,9,10]
2  [1,3,4,5,6,7,8,9,10]
3  [1,2,4,5,6,7,8,9,10]
4  [1,2,3,5,6,7,8,9,10]
5  [1,2,3,4,6,7,8,9,10]
6  [1,2,3,4,5,7,8,9,10]

I thought that I can do it using:我认为我可以使用:

results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)

But I'm getting error:但我收到错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is best and more optimized way to solve this problem?解决此问题的最佳和更优化的方法是什么?

EDIT_1: DF EDIT_1:DF

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]

You can try this:你可以试试这个:

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667]}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results

----------------------------------------------
    ID       TEST
0   250274  [244473, 240274, 247178, 248667]
1   244473  [250274, 240274, 247178, 248667]
2   240274  [250274, 244473, 247178, 248667]
3   247178  [250274, 244473, 240274, 248667]
4   248667  [250274, 244473, 240274, 247178]
----------------------------------------------

If your data frame contains several columns and you are interested only in the ID column, then you have to specify your mask by reshaping you ID array.如果您的数据框包含多列并且您只对ID列感兴趣,那么您必须通过重塑您的 ID 数组来指定您的掩码。

import numpy as np
import pandas as pd

data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}

results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]

EDIT编辑

I am not quit sure what you mean by your comment.我不确定你的评论是什么意思。 I suppose you want something like that?我想你想要这样的东西?

import numpy as np
import pandas as pd

data = {
    'ID1': [250274, 244473, 240274, 247178, 248667],
    'ID2': [244473, 240274, 247178, 248667, 250274],
}



results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])

results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]

------------------------------------------------
    ID1     ID2     TEST
0   250274  244473  [240274, 247178, 248667]
1   244473  240274  [250274, 247178, 248667]
2   240274  247178  [250274, 244473, 248667]
3   247178  248667  [250274, 244473, 240274]
4   248667  250274  [244473, 240274, 247178]
------------------------------------------------

If not, please make your comment more precise.如果不是,请让您的评论更准确。

我使用了那个解决方案:

results['RESULTS'] = results['ID'].apply(lambda x: [i for i in result_02 if x!=i])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM