简体   繁体   中英

Implementing functions with dataframes in python

在此处输入图片说明 I have this problem where I am stuck for quite a number of days.

I have this function :

def cal_score(research, citations, teaching, international, income):
     return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income

where “research”, “citations”, “teaching”, “international” and “income” are columns of the dataset. I want to add a new column in the dataset whose values should be calculated based on the function mentioned above. I tried different procedures but none worked.

Example : If we have a row as below

university_name  Indian Institute of Technology Bombay


teaching  43.8

international  14.3

research  24.2

citations  8,327

income   14.9

Total Score Ranking  

Then the total score should be calculated as

Total Score =  .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.

This should apply for all the rows in the dataset.

Can anyone please help me in implementing this requirement. I am stuck at this for quite sometime now. :-(

Indian_univ.head(10).to_dict()

{'citations': {510: 38.799999999999997,
  832: 39.0,
  856: 45.600000000000001,
  959: 45.799999999999997,
  1232: 84.700000000000003,
  1360: 38.5,
  1361: 41.799999999999997,
  1362: 35.299999999999997,
  1363: 53.600000000000001,
  1679: 51.600000000000001},
 'country': {510: 'India',
  832: 'India',
  856: 'India',
  959: 'India',
  1232: 'India',
  1360: 'India',
  1361: 'India',
  1362: 'India',
  1363: 'India',
  1679: 'India'},
 'female_male_ratio': {510: '16 : 84',
  832: '15 : 85',
  856: '16 : 84',
  959: '17 : 83',
  1232: '46 : 54',
  1360: '18 : 82',
  1361: '13 : 87',
  1362: '15 : 85',
  1363: '17 : 83',
  1679: '19 : 81'},
 'income': {510: '24.2',
  832: '72.4',
  856: '52.7',
  959: '70.4',
  1232: '28.4',
  1360: '-',
  1361: '42.4',
  1362: '-',
  1363: '64.8',
  1679: '37.9'},
 'international': {510: '14.3',
  832: '16.1',
  856: '19.9',
  959: '15.6',
  1232: '29.3',
  1360: '15.3',
  1361: '17.3',
  1362: '14.7',
  1363: '15.6',
  1679: '18.2'},
 'international_students': {510: '1%',
  832: '0%',
  856: '1%',
  959: '1%',
  1232: '1%',
  1360: '1%',
  1361: '0%',
  1362: '0%',
  1363: '1%',
  1679: '1%'},
 'num_students': {510: '8,327',
  832: '9,928',
  856: '8,327',
  959: '8,061',
  1232: '16,691',
  1360: '8,371',
  1361: '6,167',
  1362: '9,928',
  1363: '8,061',
  1679: '3,318'},
 'research': {510: 15.699999999999999,
  832: 45.299999999999997,
  856: 33.100000000000001,
  959: 13.699999999999999,
  1232: 14.0,
  1360: 23.0,
  1361: 25.199999999999999,
  1362: 30.0,
  1363: 12.300000000000001,
  1679: 39.5},
 'student_staff_ratio': {510: 14.9,
  832: 17.5,
  856: 14.9,
  959: 18.699999999999999,
  1232: 23.899999999999999,
  1360: 17.300000000000001,
  1361: 12.199999999999999,
  1362: 17.5,
  1363: 18.699999999999999,
  1679: 8.1999999999999993},
 'teaching': {510: 43.799999999999997,
  832: 44.200000000000003,
  856: 47.299999999999997,
  959: 30.399999999999999,
  1232: 25.800000000000001,
  1360: 33.799999999999997,
  1361: 31.300000000000001,
  1362: 39.299999999999997,
  1363: 25.100000000000001,
  1679: 32.600000000000001},
 'total_score': {510: 29.489999999999995,
  832: 38.549999999999997,
  856: 37.799999999999997,
  959: 26.969999999999999,
  1232: 37.350000000000001,
  1360: 28.589999999999996,
  1361: 29.489999999999998,
  1362: 31.379999999999995,
  1363: 27.299999999999997,
  1679: 37.109999999999999},
 'university_name': {510: 'Indian Institute of Technology Bombay',
  832: 'Indian Institute of Technology Kharagpur',
  856: 'Indian Institute of Technology Bombay',
  959: 'Indian Institute of Technology Roorkee',
  1232: 'Panjab University',
  1360: 'Indian Institute of Technology Delhi',
  1361: 'Indian Institute of Technology Kanpur',
  1362: 'Indian Institute of Technology Kharagpur',
  1363: 'Indian Institute of Technology Roorkee',
  1679: 'Indian Institute of Science'},
 'world_rank': {510: '301-350',
  832: '226-250',
  856: '251-275',
  959: '351-400',
  1232: '226-250',
  1360: '351-400',
  1361: '351-400',
  1362: '351-400',
  1363: '351-400',
  1679: '276-300'},
 'year': {510: 2012,
  832: 2013,
  856: 2013,
  959: 2013,
  1232: 2014,
  1360: 2014,
  1361: 2014,
  1362: 2014,
  1363: 2014,
  1679: 2015}}

I think you can use:

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations + 
                    .3 **df.teaching + 
                    .075 **df.international + 
                    .025 **df.income

If need apply function, what is very often slowier:

def cal_score(x):
     return .3 **x.research + 
            .3 **x.citations + 
            .3 **x.teaching +
            .075 **x.international + 
            .025 **x.income

df['Total Score'] = df.apply(cal_score, axis=1)    

EDIT with data:

You need first replace columns num_students and income and then convert to float by astype :

EDIT2 by sample of data:

import pandas as pd

df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')

#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] = 
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)

df['Total Score'] = .3 **df.research + 
                    .3 **df.citations +  
                    .3 **df.teaching +  
                    .075 **df.international +  
                    .025 **df.income
print (df)

      citations country female_male_ratio  income  international  \
510        38.8   India           16 : 84    24.2           14.3   
832        39.0   India           15 : 85    72.4           16.1   
856        45.6   India           16 : 84    52.7           19.9   
959        45.8   India           17 : 83    70.4           15.6   
1232       84.7   India           46 : 54    28.4           29.3   
1360       38.5   India           18 : 82     0.0           15.3   
1361       41.8   India           13 : 87    42.4           17.3   
1362       35.3   India           15 : 85     0.0           14.7   
1363       53.6   India           17 : 83    64.8           15.6   
1679       51.6   India           19 : 81    37.9           18.2   

     international_students num_students  research  student_staff_ratio  \
510                      1%         8327      15.7                 14.9   
832                      0%         9928      45.3                 17.5   
856                      1%         8327      33.1                 14.9   
959                      1%         8061      13.7                 18.7   
1232                     1%        16691      14.0                 23.9   
1360                     1%         8371      23.0                 17.3   
1361                     0%         6167      25.2                 12.2   
1362                     0%         9928      30.0                 17.5   
1363                     1%         8061      12.3                 18.7   
1679                     1%         3318      39.5                  8.2   

      teaching  total_score                           university_name  \
510       43.8        29.49     Indian Institute of Technology Bombay   
832       44.2        38.55  Indian Institute of Technology Kharagpur   
856       47.3        37.80     Indian Institute of Technology Bombay   
959       30.4        26.97    Indian Institute of Technology Roorkee   
1232      25.8        37.35                         Panjab University   
1360      33.8        28.59      Indian Institute of Technology Delhi   
1361      31.3        29.49     Indian Institute of Technology Kanpur   
1362      39.3        31.38  Indian Institute of Technology Kharagpur   
1363      25.1        27.30    Indian Institute of Technology Roorkee   
1679      32.6        37.11               Indian Institute of Science   

     world_rank  year   Total Score  
510     301-350  2012  6.177371e-09  
832     226-250  2013  7.776087e-19  
856     251-275  2013  4.928529e-18  
959     351-400  2013  6.863746e-08  
1232    226-250  2014  4.782972e-08  
1360    351-400  2014  1.000000e+00  
1361    351-400  2014  6.664022e-14  
1362    351-400  2014  1.000000e+00  
1363    351-400  2014  3.703322e-07  
1679    276-300  2015  9.003721e-18  

这是最直接的方法:

df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM