[英]Implementing functions with dataframes in python
我有这个功能:
def cal_score(research, citations, teaching, international, income):
return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income
其中“研究”,“引用”,“教学”,“国际”和“收入”是数据集的列。 我想在数据集中添加一个新列,其值应根据上述函数进行计算。 我尝试了不同的步骤,但没有任何效果。
示例:如果我们有如下一行
university_name Indian Institute of Technology Bombay
teaching 43.8
international 14.3
research 24.2
citations 8,327
income 14.9
Total Score Ranking
然后,总分应计算为
Total Score = .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.
这应适用于数据集中的所有行。
谁能帮我实现这一要求。 我在这个问题上停留了一段时间。 :-(
Indian_univ.head(10).to_dict()
{'citations': {510: 38.799999999999997,
832: 39.0,
856: 45.600000000000001,
959: 45.799999999999997,
1232: 84.700000000000003,
1360: 38.5,
1361: 41.799999999999997,
1362: 35.299999999999997,
1363: 53.600000000000001,
1679: 51.600000000000001},
'country': {510: 'India',
832: 'India',
856: 'India',
959: 'India',
1232: 'India',
1360: 'India',
1361: 'India',
1362: 'India',
1363: 'India',
1679: 'India'},
'female_male_ratio': {510: '16 : 84',
832: '15 : 85',
856: '16 : 84',
959: '17 : 83',
1232: '46 : 54',
1360: '18 : 82',
1361: '13 : 87',
1362: '15 : 85',
1363: '17 : 83',
1679: '19 : 81'},
'income': {510: '24.2',
832: '72.4',
856: '52.7',
959: '70.4',
1232: '28.4',
1360: '-',
1361: '42.4',
1362: '-',
1363: '64.8',
1679: '37.9'},
'international': {510: '14.3',
832: '16.1',
856: '19.9',
959: '15.6',
1232: '29.3',
1360: '15.3',
1361: '17.3',
1362: '14.7',
1363: '15.6',
1679: '18.2'},
'international_students': {510: '1%',
832: '0%',
856: '1%',
959: '1%',
1232: '1%',
1360: '1%',
1361: '0%',
1362: '0%',
1363: '1%',
1679: '1%'},
'num_students': {510: '8,327',
832: '9,928',
856: '8,327',
959: '8,061',
1232: '16,691',
1360: '8,371',
1361: '6,167',
1362: '9,928',
1363: '8,061',
1679: '3,318'},
'research': {510: 15.699999999999999,
832: 45.299999999999997,
856: 33.100000000000001,
959: 13.699999999999999,
1232: 14.0,
1360: 23.0,
1361: 25.199999999999999,
1362: 30.0,
1363: 12.300000000000001,
1679: 39.5},
'student_staff_ratio': {510: 14.9,
832: 17.5,
856: 14.9,
959: 18.699999999999999,
1232: 23.899999999999999,
1360: 17.300000000000001,
1361: 12.199999999999999,
1362: 17.5,
1363: 18.699999999999999,
1679: 8.1999999999999993},
'teaching': {510: 43.799999999999997,
832: 44.200000000000003,
856: 47.299999999999997,
959: 30.399999999999999,
1232: 25.800000000000001,
1360: 33.799999999999997,
1361: 31.300000000000001,
1362: 39.299999999999997,
1363: 25.100000000000001,
1679: 32.600000000000001},
'total_score': {510: 29.489999999999995,
832: 38.549999999999997,
856: 37.799999999999997,
959: 26.969999999999999,
1232: 37.350000000000001,
1360: 28.589999999999996,
1361: 29.489999999999998,
1362: 31.379999999999995,
1363: 27.299999999999997,
1679: 37.109999999999999},
'university_name': {510: 'Indian Institute of Technology Bombay',
832: 'Indian Institute of Technology Kharagpur',
856: 'Indian Institute of Technology Bombay',
959: 'Indian Institute of Technology Roorkee',
1232: 'Panjab University',
1360: 'Indian Institute of Technology Delhi',
1361: 'Indian Institute of Technology Kanpur',
1362: 'Indian Institute of Technology Kharagpur',
1363: 'Indian Institute of Technology Roorkee',
1679: 'Indian Institute of Science'},
'world_rank': {510: '301-350',
832: '226-250',
856: '251-275',
959: '351-400',
1232: '226-250',
1360: '351-400',
1361: '351-400',
1362: '351-400',
1363: '351-400',
1679: '276-300'},
'year': {510: 2012,
832: 2013,
856: 2013,
959: 2013,
1232: 2014,
1360: 2014,
1361: 2014,
1362: 2014,
1363: 2014,
1679: 2015}}
我认为您可以使用:
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
如果需要apply
功能,通常会比较慢:
def cal_score(x):
return .3 **x.research +
.3 **x.citations +
.3 **x.teaching +
.075 **x.international +
.025 **x.income
df['Total Score'] = df.apply(cal_score, axis=1)
编辑数据:
您需要先replace
num_students
和income
列,然后按astype
转换为float
:
EDIT2按数据样本:
import pandas as pd
df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')
#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] =
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
print (df)
citations country female_male_ratio income international \
510 38.8 India 16 : 84 24.2 14.3
832 39.0 India 15 : 85 72.4 16.1
856 45.6 India 16 : 84 52.7 19.9
959 45.8 India 17 : 83 70.4 15.6
1232 84.7 India 46 : 54 28.4 29.3
1360 38.5 India 18 : 82 0.0 15.3
1361 41.8 India 13 : 87 42.4 17.3
1362 35.3 India 15 : 85 0.0 14.7
1363 53.6 India 17 : 83 64.8 15.6
1679 51.6 India 19 : 81 37.9 18.2
international_students num_students research student_staff_ratio \
510 1% 8327 15.7 14.9
832 0% 9928 45.3 17.5
856 1% 8327 33.1 14.9
959 1% 8061 13.7 18.7
1232 1% 16691 14.0 23.9
1360 1% 8371 23.0 17.3
1361 0% 6167 25.2 12.2
1362 0% 9928 30.0 17.5
1363 1% 8061 12.3 18.7
1679 1% 3318 39.5 8.2
teaching total_score university_name \
510 43.8 29.49 Indian Institute of Technology Bombay
832 44.2 38.55 Indian Institute of Technology Kharagpur
856 47.3 37.80 Indian Institute of Technology Bombay
959 30.4 26.97 Indian Institute of Technology Roorkee
1232 25.8 37.35 Panjab University
1360 33.8 28.59 Indian Institute of Technology Delhi
1361 31.3 29.49 Indian Institute of Technology Kanpur
1362 39.3 31.38 Indian Institute of Technology Kharagpur
1363 25.1 27.30 Indian Institute of Technology Roorkee
1679 32.6 37.11 Indian Institute of Science
world_rank year Total Score
510 301-350 2012 6.177371e-09
832 226-250 2013 7.776087e-19
856 251-275 2013 4.928529e-18
959 351-400 2013 6.863746e-08
1232 226-250 2014 4.782972e-08
1360 351-400 2014 1.000000e+00
1361 351-400 2014 6.664022e-14
1362 351-400 2014 1.000000e+00
1363 351-400 2014 3.703322e-07
1679 276-300 2015 9.003721e-18
这是最直接的方法:
df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.