Pivot Table:
COURSE ENGLISH MATH ART
STUDENT
StudentA 95.0 83.0 97.0
StudentB 91.0 93.0 47.0
StudentC 85.0 84.0 92.0
StudentD 97.0 84.0 85.0
StudentE 93.0 88.0 85.0
StudentAvg 94.5 83.7 96.9
I want a list of students who have a grade more than 5%
lower than StudentAvg
by subject. So in this case I'd want something like:
English: StudentC Math: Art: StudentB, StudentD, StudentE
How can I do this in Pandas?
This returns a list of tuples that show which student and in which subject had a grade more than 5% less that the average.
avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
[('StudentB', 'ART'),
('StudentC', 'ENGLISH'),
('StudentC', 'ART'),
('StudentD', 'ART'),
('StudentE', 'ART')]
We can speed up a bit with
p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
[('StudentB', 'ART'),
('StudentC', 'ENGLISH'),
('StudentC', 'ART'),
('StudentD', 'ART'),
('StudentE', 'ART')]
Timing
%%timeit
p = df.index.get_loc('StudentAvg')
v = df.values
i, j = np.where(((v / v[p]) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))
10000 loops, best of 3: 41.7 µs per loop
%%timeit
avg = df.loc['StudentAvg', :]
i, j = np.where(((df / avg) - 1) < -.05)
list(zip(df.index[i], df.columns[j]))\
1000 loops, best of 3: 662 µs per loop
df.apply(lambda x: str(x.name)+ ': ' + ', '.join(df[((x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0)].index.tolist())).values.tolist()
Output:
['ENGLISH: StudentC', 'MATH: ', 'ART: StudentB, StudentC, StudentD, StudentE']
Let's use this:
mask = df.apply(lambda x: (x-x.loc['StudentAvg'])/x.loc['StudentAvg']*100<-5.0).any(axis=1)
df[mask].index.tolist()
Output:
['StudentB', 'StudentC', 'StudentD', 'StudentE']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.