简体   繁体   English

熊猫数据框的行和列之间的Python交互

[英]Python interaction between rows and columns of pandas dataframe

I have this dataframe: 我有这个数据框:

print (df)
       exam       student
    0 French        a
    1 English       a
    2 Italian       a
    3 Chinese       b
    4 Russian       b
    5 German        b
    6 Chinese       c
    7 Spanish       c
    8 English       c
    9 French        c

I need to find for each student the number of students that took the same exams as he did. 我需要为每个学生找到参加与他相同的考试的学生数量。

It should be something like this: 应该是这样的:

  exam      student   total_st
0 French       a         1
1 English      a         1
2 Italian      a         1
3 Chinese      b         1
4 Russian      b         1
5 German       b         1
6 German       c         2
7 Spanish      c         2
8 English      c         2 
9 French       c         2

Total number for student A is 1 because it has common exams just with one student (in this case with student C). 学生A的总数为1,因为它仅对一个学生(在这种情况下,对学生C)进行普通考试。

Total number for student B is 1 because it has common exams just with one student (in this case with student C). 学生B的总数为1,因为它仅对一个学生(在这种情况下,对学生C)进行普通考试。

Total number for student C is 2 because it has common exams with both students (with students A and B). 学生C的总数为2,因为它与两个学生(与学生A和B)都有共同的考试。

Any ideas? 有任何想法吗?

Thank you in advance! 先感谢您!

You can calculate a contingency table of exam and student firstly, and then do a cross product to check if there is any overlap of exams between student and count the number of students that have at least one share exam, and map the result to the original student column: 您可以先计算examstudent的列联表,然后进行叉积检查学生之间的考试是否有重叠,并计算至少参加一次共享考试的学生人数,然后将结果映射到原始学生专栏:

cont_table = pd.crosstab(df.exam, df.student)

# cont_table.T.dot(cont_table) gives a table how many exams each student shared with 
# another student, -1 to exclude the student himself
shared_count = (cont_table.T.dot(cont_table) != 0).sum(0) - 1  
shared_count

#student
#a    1
#b    1
#c    2
#dtype: int64


df['total_st'] = df.student.map(shared_count)
df

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM