[英]Python interaction between rows and columns in dataframe
I have a dataframe: 我有一个数据帧:
df = pd.DataFrame({
'exam': [
'French', 'English', 'German', 'Russian', 'Russian',
'German', 'German', 'French', 'English', 'French'
],
'student' : ['john', 'ted', 'jason', 'marc', 'peter', 'bob',
'robert', 'david', 'nik', 'kevin'
]
})
print (df)
exam student
0 French john
1 English ted
2 German jason
3 Russian marc
4 Russian peter
5 German bob
6 German robert
7 French david
8 English nik
9 French kevin
Does anybody know how to create a new dataframe containing two columns "student" and "student shared exam with". 有没有人知道如何创建一个包含两列“学生”和“学生共享考试”的新数据框。
I should get something like: 我应该得到类似的东西:
student shared_exam_with
0 john david
1 john kevin
2 ted nik
3 jason bob
4 jason robert
5 marc peter
6 peter marc
7 bob jason
8 bob robert
9 robert jason
10 robert bob
11 david john
12 david kevin
13 nik ted
14 kevin john
15 kevin david
For ex: John took French..and David and Kevin too! 对于前:约翰带法国人......还有大卫和凯文!
Any ideas? 有任何想法吗? Thank you in advance!
先感谢您!
self merge
自我
merge
df.merge(
df, on='exam',
suffixes=['', '_shared_with']
).query('student != student_shared_with')
exam student student_shared_with
1 French john david
2 French john kevin
3 French david john
5 French david kevin
6 French kevin john
7 French kevin david
10 English ted nik
11 English nik ted
14 German jason bob
15 German jason robert
16 German bob jason
18 German bob robert
19 German robert jason
20 German robert bob
23 Russian marc peter
24 Russian peter marc
self join
自我
join
d1 = df.set_index('exam')
d1.join(
d1, rsuffix='_shared_with'
).query('student != student_shared_with')
student student_shared_with
exam
English ted nik
English nik ted
French john david
French john kevin
French david john
French david kevin
French kevin john
French kevin david
German jason bob
German jason robert
German bob jason
German bob robert
German robert jason
German robert bob
Russian marc peter
Russian peter marc
itertools.permutations
+ groupby
itertools.permutations
+ groupby
from itertools import permutations as perm
cols = ['student', 'student_shared_with']
df.groupby('exam').student.apply(
lambda x: pd.DataFrame(list(perm(x, 2)), columns=cols)
).reset_index(drop=True)
student student_shared_with
0 ted nik
1 nik ted
2 john david
3 john kevin
4 david john
5 david kevin
6 kevin john
7 kevin david
8 jason bob
9 jason robert
10 bob jason
11 bob robert
12 robert jason
13 robert bob
14 marc peter
15 peter marc
One way would be: 一种方法是:
cross = pd.crosstab(df['student'], df['exam'])
res = cross.dot(cross.T)
res.where(np.triu(res, k=1).astype('bool')).stack()
Out:
student student
bob jason 1.0
robert 1.0
david john 1.0
kevin 1.0
jason robert 1.0
john kevin 1.0
marc peter 1.0
nik ted 1.0
dtype: float64
The dot product generates a binary matrix for the co occurrences. 点积产生共生的二元矩阵。 In order not to repeat the same pairs, I filter them with where and stack.
为了不重复相同的对,我用where和stack过滤它们。 The index of the resulting Series is the students that have the same exam.
得到的系列的索引是具有相同考试的学生。
This would be a one step process in SQL, but here it's two: (1) merge the DataFrame (on exam) with itself, and (2) get rid of rows were student == student_shared (since a student doesn't share with themself) 这将是SQL中的一个步骤,但这里有两个:(1)将DataFrame(在考试中)与自身合并,以及(2)删除行是student == student_shared(因为学生不与之分享自理)
df2 = pd.merge(
df, df, how='outer', on='exam', suffixes=['', '_shared_with']).drop('exam', axis=1)
df2 = df2.loc[df2.student != df2.student_shared_with]
student student_shared_with
1 john david
2 john kevin
3 david john
5 david kevin
6 kevin john
7 kevin david
10 ted nik
11 nik ted
14 jason bob
15 jason robert
16 bob jason
18 bob robert
19 robert jason
20 robert bob
23 marc peter
24 peter marc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.