[英]Intersection of rows of a Dataframe based on the value in a column in the dataframe
I have a df as shown below.我有一个 df ,如下所示。 I am trying to find the intersection of rows based on the value of the host column.
我正在尝试根据主机列的值查找行的交集。
host values
test ['A','B','C','D']
test ['D','E','B','F']
prod ['1','2','A','D','E']
prod []
prod ['2']
the expected output is intersection of the a row with the next row if the host value is same.如果主机值相同,则预期的 output 是一行与下一行的交集。 For the above df, the output would be
对于上述 df,output 将是
test=['B','D'] - intersection of row 1 and 2
prod=[] - intersection of row 3 and 4
prod=[] - intersection of row 4 and 5
the intersection of rows 2 and 3 is not performed as the host column value doesn't match.由于主机列值不匹配,因此不执行第 2 行和第 3 行的交集。 Any help is appreciated.
任何帮助表示赞赏。
The df.to_dict() value is df.to_dict() 值为
{'host': {0: 'test', 1: 'test', 2: 'prod', 3: 'prod', 4: 'prod'},
'values': {0: ['A', 'B', 'C', 'D'],
1: ['D', 'E', 'B', 'F'],
2: ['1', '2', 'A', 'D', 'E'],
3: [],
4: ['2']}
}
Not sure of the structure of expected result, but you could create a column per group of host with shift
.不确定预期结果的结构,但您可以使用
shift
为每组主机创建一列。 then use apply
where this new column is notna
and do intersection of set
s.然后在这个新列不是的
notna
使用apply
并做set
s 的交集。
df['val_shift'] = df.groupby('host')['values'].shift()
df['intersect'] = df[df['val_shift'].notna()]\
.apply(lambda x: list(set(x['values'])&set(x['val_shift'])), axis=1)
print (df)
host values val_shift intersect
0 test [A, B, C, D] NaN NaN
1 test [D, E, B, F] [A, B, C, D] [B, D]
2 host [1, 2, A, D, E] NaN NaN
3 host [] [1, 2, A, D, E] []
4 host [2] [] []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.