recently I want to do some sorting, calculation and find the maximum in data frame. For example:
data = {'Name':['Penny','Ben','Benny','Mark'],
'Eng':[5,1,4,3],
'Math':[1,5,3,2],
'Physics':[2,5,3,1],
'Sports':[4,5,2,3],
'Total':[12,16,12,9]}
df1=pd.DataFrame(data, columns=['Name','Eng','Math','Physics','Sports','Total'])
df1
I want to get the range of different subject and I find a function
numpy.ptp
Which can find the range of values (maximum - minimum) along an axis, thus I do this import numpy as np
cols_of_interest = ['Eng','Math','Sports','Physics']
np.ptp(df1[cols_of_interest].values, axis=1)
Result
array([4, 4, 2, 2])
When I get the result, the information from the data frame is lost. For example, I want to find the students who have the largest range should be (Penny:4, Ben:4) However, when the data size is large, how can I merge those data back to the data frame and find the max?
Also, for cols_of_interest = ['Eng','Math','Sports','Physics']
, when the elements are large (like 100 subjects), is there any elegant way to apply np.ptp?
Many thanks!!
Simply assign the output of np.ptp
:
df1['max_range'] = np.ptp(df1[cols_of_interest].values, axis=1)
Finally, you can find the max with: max_val = df1['max_range'].max()
or df1['max_range'].idxmax()
if you want the index of the max value.
is there any elegant way to apply np.ptp?
You can access the columns of a dataframe with df1.columns
. This returns a list of columns; then simply drop the names you do not want from that list, and pass it into np.ptp
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.