简体   繁体   English

如何在 for 循环中一次访问两个元素而 python 中没有重复项?

[英]How to access two elements at once in a for loop without duplicates in python?

I have a table that looks like this:我有一个看起来像这样的表:

Celebrity名人 Username用户名
A一个 user1用户1
B user1用户1
C C user2用户2
A一个 user3用户3
A一个 user2用户2
D D user2用户2
D D user3用户3

I wrote a function to find the overlaps of users between two celebrities:我写了一个 function 来查找两个名人之间的用户重叠:

def num_of_fans_overlap(cel1,cel2,data,Celebrity,Usernames):
    l = [cel1,cel2]
    Res = len(data.loc[data['Usernames'].map(data.groupby('Usernames').agg(set)['Celebrity'].eq(set(l)))])/2
    return print(int(Res))

for example, if I run num_of_fans_overlap(A,B,data,"Celebrity","Username"), I will get 1, which means one user has followed both celebrities.例如,如果我运行 num_of_fans_overlap(A,B,data,"Celebrity","Username"),我会得到 1,这意味着一个用户关注了两个名人。

Now I want to run a for loop, and the output should look like this:现在我想运行一个 for 循环,output 应该是这样的:

("A", "B", 1)
("A", "C", 1)
("A", "D", 2)
("B", "C", 0)
("B", "D", 0)
("C", "D", 1)

I have been stuck here. Hope someone can help.

Check with crosstab then dot检查crosstab然后dot

s = pd.crosstab(df.Celebrity,df.Username)
s = s.dot(s.T)
out = s.mask(np.triu(np.ones(s.shape)).astype(bool)).stack()
Out[301]: 
Celebrity  Celebrity
B          A            1.0
C          A            1.0
           B            0.0
D          A            2.0
           B            0.0
           C            1.0
dtype: float64

First, the function num_of_fans_overlap shouldn't return a print() .首先, function num_of_fans_overlap不应返回print()

def num_of_fans_overlap(cel1,cel2,data):
    l = [cel1,cel2]
    Res = len(data.loc[data['Usernames'].map(data.groupby('Usernames').agg(set)['Celebrity'].eq(set(l)))])/2
    return int(Res)

Second, if the variable celebrities is a list of unique values on the Celebrity column.其次,如果变量celebritiesCelebrity列上的唯一值列表。


from itertools import combinations
celebrities = list(data.Celebrity.unique())
for (cel1, cel2) in combinations(celebrities, 2):
   fans_overlap = num_of_fans_overlap(cel1, cel2, data)
   print((cel1, cel2, fans_overlap))

The naive way to do this:这样做的天真方法:

celebs = ["A", "B", "C", "D"]
for i in range(len(celebs)):
    for j in range(i+1, len(celebs)):
        celeb_pairs = (celebs[i], celebs[j])
        run_your_function_here(*celeb_pairs, other, parameters)

You can also do this elegantly by using the itertools.combinations function:您也可以使用itertools.combinations function 优雅地做到这一点:

import itertools

celebs = ["A", "B", "C", "D"]
for celeb1, celeb2 in itertools.combinations(celebs, 2):
    run_your_function_here

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM