[英]Frozenset union of two columns
I have a dataset containing two columns with frozensets.我有一个包含两列冻结集的数据集。 Now I would like to merge/take the union of these frozensets.
现在我想合并/合并这些frozensets。 I can do this with a for loop, however my dataset contains > 27 million rows, so I am looking for a way to avoid the for loop.
我可以用 for 循环来做到这一点,但是我的数据集包含 > 2700 万行,所以我正在寻找一种方法来避免 for 循环。 Anyone any thoughts?
有人有什么想法吗?
Data数据
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
Code with for loop带有 for 循环的代码
from functools import reduce
df['frozenset']=0
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
Desired output期望输出
ID1 ID2 frozenset
0 (a, b) (c, g) (a, c, g, b)
1 (a, c) (f, i) (a, c, f, i)
2 (c, d) (t, l) (c, d, t, l)
Doesn't seem like you need to use functools.reduce
here.似乎您不需要在这里使用
functools.reduce
。 Doing a direct union with each pair of frozensets should suffice.对每对冻结集进行直接联合就足够了。
If you want the most speed possible for this sort of operation, I recommend taking a look at list comprehensions (see For loops with pandas - When should I care? for an exhaustive discussion).如果您希望此类操作的速度尽可能快,我建议您查看列表推导式(请参阅For loops with pandas - What should I care?进行详尽的讨论)。
df['union'] = [x | y for x, y in zip(df['ID1'], df['ID2'])]
df
ID1 ID2 union
0 (a, b) (c, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t)
If you want this to generalise for multiple columns, you can union them all using frozenset.union()
.如果您希望将其推广到多列,您可以使用
frozenset.union()
它们全部frozenset.union()
。
df['union2'] = [frozenset.union(*X) for X in df[['ID1', 'ID2']].values]
df
ID1 ID2 union union2
0 (a, b) (c, g) (c, a, b, g) (c, a, b, g)
1 (c, a) (f, i) (c, a, i, f) (c, a, i, f)
2 (c, d) (l, t) (c, l, d, t) (c, l, d, t)
You can try:你可以试试:
import pandas as pd
import numpy as np
d = {'ID1': [frozenset(['a', 'b']), frozenset(['a','c']), frozenset(['c','d'])],
'ID2': [frozenset(['c', 'g']), frozenset(['i','f']), frozenset(['t','l'])]}
df = pd.DataFrame(data=d)
from functools import reduce
df['frozenset']=0
add = []
for i in range(len(df)):
df['frozenset'].iloc[i] = reduce(frozenset.union, [df['ID1'][i],df['ID2'][i]])
add.append(df)
print(add)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.