[英]Find intersection between rows
If I want to find the difference between two consecutive rows in a pandas DataFrame, I can simply call the diff
function. 如果我想在pandas DataFrame中找到两个连续行之间的差异,我可以简单地调用diff
函数。
I have rows that contain set
s of characters. 我有一个包含行set
的字符秒。 What I want to do now is compute the intersection of each set in rowise pairs. 我现在想做的是计算行对中每个集合的交集。 in other words, I'd like to use diff
, but supply my own function instead. 换句话说,我想使用diff
,而是提供自己的函数。 Is there a way to accomplish this in pandas? 有没有办法在熊猫身上做到这一点?
example input: 输入示例:
100118231 1 set([])
2 set([142.136.6])
3 set([142.136.6])
4 set([])
5 set([])
6 set([108.0.239])
desired output: 所需的输出:
100118231 1 set([]) NaN
2 set([142.136.6]) set([])
3 set([142.136.6]) {142.136.6}
4 set([]) set([])
5 set([]) set([])
6 set([108.0.239]) set([])
I've tried using shift
, but it throws an error 我尝试使用shift
,但是会引发错误
In [213]: type(tgr.head(1))
Out[213]: pandas.core.frame.DataFrame
In [214]: tt=tgr.apply(lambda x: x['value'].intersection((x['value'].shift(-1))))
AttributeError: 'Series' object has no attribute 'intersection'
&
will run over all the items, there's no need to involve lambdas and the like. &
将遍历所有项目,而无需涉及lambda等。
> df = pd.DataFrame(['hi', set([142,136,6]), set([142, 137, 6]), set([0, 6])]).iloc[1:]
> df & df.shift(1)
0
1 NaN
2 set([142, 6])
3 set([6])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.