I have the following dataframe:
df = pd.DataFrame([[1, 2], [1, 3], [4, 6], [4, 7]], columns=['group_id', 'student_id'])
Each student_id
can appear multiple times in different group_id
s with others student_id
s.
I want to count how many times student x
was in the same group as student y
. In other words, I want anxn DF where each entry is the number of times 2 students have been on the same group (same group_id
, when no match, fill with 0).
2 2 3 4 5 6 7
3 1 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 1
7 0 0 0 0 1 0
Any way I can do it in a cleaver way with SQL or Pandas?
Thanks
Do with numpy
outer
s = df.group_id.to_numpy()
yourdf = pd.DataFrame(np.equal.outer(s,s),index=df.student_id,columns=df.student_id).astype(int)
yourdf
Out[40]:
student_id 2 3 6 7
student_id
2 1 1 0 0
3 1 1 0 0
6 0 0 1 1
7 0 0 1 1
Or do
freq = pd.crosstab(df['group_id'],df['student_id'])
yourdf = freq.T.dot(freq)
Out[45]:
student_id 2 3 6 7
student_id
2 1 1 0 0
3 1 1 0 0
6 0 0 1 1
7 0 0 1 1
you can merge
and then pivot_table
:
df_ = (df.merge(df, on='group_id')
.pivot_table(index='student_id_x', columns='student_id_y',
values='group_id', aggfunc='nunique').fillna(0)
.astype(int)
)
print (df_)
student_id_y 2 3 6 7
student_id_x
2 1 1 0 0
3 1 1 0 0
6 0 0 1 1
7 0 0 1 1
You can do:
# make dummy cols in the dataframe
df['student_id_2'] = df['student_id'].copy()
df['flag'] = 1
dx = (df
.drop('group_id', 1)
.set_index(['student_id', 'student_id_2'])
.unstack(-1)
.fillna(0))
# fix column names
dx.columns.names = None, None
dx.columns = [x[1] for x in dx.columns]
print(dx)
2 3 6 7
student_id
2 1.0 0.0 0.0 0.0
3 0.0 1.0 0.0 0.0
6 0.0 0.0 1.0 0.0
7 0.0 0.0 0.0 1.0
To present a more instructive example (better filled), I prepared a bit bigger source DataFrame:
group_id student_id
0 1 2
1 1 3
2 2 2
3 2 6
4 3 3
5 3 2
6 4 6
7 4 7
To get the result, run:
stId = df.student_id.unique()
result = pd.DataFrame(0, index=stId, columns=stId)
for s1, s2 in df.groupby('group_id').student_id.apply(list):
result.loc[s2, s1] += 1
result.loc[s1, s2] += 1
When you print the result, you will get:
2 3 6 7
2 0 2 1 0
3 2 0 0 0
6 1 0 0 1
7 0 0 1 0
As you can see:
In my opinion, there is something wrong in each solution showing that a student was in one group with himself.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.