[英]Check if values of a column is in another column array in a pandas dataframe
[英]Check array values and add resulted array as column to pandas dataframe
我需要将数组作为列添加到 Dataframe:
results['TEST'] = results.apply(lambda x: results_02, axis=1)
结果我得到这样的Dataframe:
ID TEST
1 [1,2,3,4,5,6,7,8,9,10]
2 [1,2,3,4,5,6,7,8,9,10]
3 [1,2,3,4,5,6,7,8,9,10]
4 [1,2,3,4,5,6,7,8,9,10]
5 [1,2,3,4,5,6,7,8,9,10]
6 [1,2,3,4,5,6,7,8,9,10]
但是我想添加条件以检查results['ID'] in results_02
是否将除现有值之外的所有值添加到该行,并且我需要为每一行执行此操作。
所以结果数据框需要是这样的:
ID TEST
1 [2,3,4,5,6,7,8,9,10]
2 [1,3,4,5,6,7,8,9,10]
3 [1,2,4,5,6,7,8,9,10]
4 [1,2,3,5,6,7,8,9,10]
5 [1,2,3,4,6,7,8,9,10]
6 [1,2,3,4,5,7,8,9,10]
我认为我可以使用:
results['TEST'] = results.apply(lambda x: results_02[:10] if x not in results_02[:10] else results_02.remove(x)[:10], axis=1)
但我收到错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
解决此问题的最佳和更优化的方法是什么?
EDIT_1:DF
data = {'ID': [250274, 244473, 240274, 247178, 248667]}
df = pd.DataFrame(data)
results_02 = [250274, 244473, 240274, 247178, 248667]
你可以试试这个:
import numpy as np
import pandas as pd
data = {'ID': [250274, 244473, 240274, 247178, 248667]}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
mask = results.values != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
results
----------------------------------------------
ID TEST
0 250274 [244473, 240274, 247178, 248667]
1 244473 [250274, 240274, 247178, 248667]
2 240274 [250274, 244473, 247178, 248667]
3 247178 [250274, 244473, 240274, 248667]
4 248667 [250274, 244473, 240274, 247178]
----------------------------------------------
如果您的数据框包含多列并且您只对ID
列感兴趣,那么您必须通过重塑您的 ID 数组来指定您的掩码。
import numpy as np
import pandas as pd
data = {'ID': [250274, 244473, 240274, 247178, 248667], 'some_col': ['A', 'B', 'C', 'D', 'E']}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
mask = results.ID.values.reshape(-1, 1) != result_02
results['TEST'] = [result_02[mask_row] for mask_row in mask]
我不确定你的评论是什么意思。 我想你想要这样的东西?
import numpy as np
import pandas as pd
data = {
'ID1': [250274, 244473, 240274, 247178, 248667],
'ID2': [244473, 240274, 247178, 248667, 250274],
}
results = pd.DataFrame(data)
result_02 = np.array([250274, 244473, 240274, 247178, 248667])
results['TEST'] = [result_02[~np.in1d(result_02, row)] for row in results.values]
------------------------------------------------
ID1 ID2 TEST
0 250274 244473 [240274, 247178, 248667]
1 244473 240274 [250274, 247178, 248667]
2 240274 247178 [250274, 244473, 248667]
3 247178 248667 [250274, 244473, 240274]
4 248667 250274 [244473, 240274, 247178]
------------------------------------------------
如果不是,请让您的评论更准确。
我使用了那个解决方案:
results['RESULTS'] = results['ID'].apply(lambda x: [i for i in result_02 if x!=i])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.