[英]How to access two elements at once in a for loop without duplicates in python?
I have a table that looks like this:我有一个看起来像这样的表:
Celebrity名人 | Username用户名 |
---|---|
A一个 | user1用户1 |
B乙 | user1用户1 |
C C | user2用户2 |
A一个 | user3用户3 |
A一个 | user2用户2 |
D D | user2用户2 |
D D | user3用户3 |
I wrote a function to find the overlaps of users between two celebrities:我写了一个 function 来查找两个名人之间的用户重叠:
def num_of_fans_overlap(cel1,cel2,data,Celebrity,Usernames):
l = [cel1,cel2]
Res = len(data.loc[data['Usernames'].map(data.groupby('Usernames').agg(set)['Celebrity'].eq(set(l)))])/2
return print(int(Res))
for example, if I run num_of_fans_overlap(A,B,data,"Celebrity","Username"), I will get 1, which means one user has followed both celebrities.例如,如果我运行 num_of_fans_overlap(A,B,data,"Celebrity","Username"),我会得到 1,这意味着一个用户关注了两个名人。
Now I want to run a for loop, and the output should look like this:现在我想运行一个 for 循环,output 应该是这样的:
("A", "B", 1)
("A", "C", 1)
("A", "D", 2)
("B", "C", 0)
("B", "D", 0)
("C", "D", 1)
I have been stuck here. Hope someone can help.
Check with crosstab
then dot
检查crosstab
然后dot
s = pd.crosstab(df.Celebrity,df.Username)
s = s.dot(s.T)
out = s.mask(np.triu(np.ones(s.shape)).astype(bool)).stack()
Out[301]:
Celebrity Celebrity
B A 1.0
C A 1.0
B 0.0
D A 2.0
B 0.0
C 1.0
dtype: float64
First, the function num_of_fans_overlap
shouldn't return a print()
.首先, function num_of_fans_overlap
不应返回print()
。
def num_of_fans_overlap(cel1,cel2,data):
l = [cel1,cel2]
Res = len(data.loc[data['Usernames'].map(data.groupby('Usernames').agg(set)['Celebrity'].eq(set(l)))])/2
return int(Res)
Second, if the variable celebrities
is a list of unique values on the Celebrity
column.其次,如果变量celebrities
是Celebrity
列上的唯一值列表。
from itertools import combinations
celebrities = list(data.Celebrity.unique())
for (cel1, cel2) in combinations(celebrities, 2):
fans_overlap = num_of_fans_overlap(cel1, cel2, data)
print((cel1, cel2, fans_overlap))
The naive way to do this:这样做的天真方法:
celebs = ["A", "B", "C", "D"]
for i in range(len(celebs)):
for j in range(i+1, len(celebs)):
celeb_pairs = (celebs[i], celebs[j])
run_your_function_here(*celeb_pairs, other, parameters)
You can also do this elegantly by using the itertools.combinations
function:您也可以使用itertools.combinations
function 优雅地做到这一点:
import itertools
celebs = ["A", "B", "C", "D"]
for celeb1, celeb2 in itertools.combinations(celebs, 2):
run_your_function_here
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.