[英]Implementing functions with dataframes in python
我有這個功能:
def cal_score(research, citations, teaching, international, income):
return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income
其中“研究”,“引用”,“教學”,“國際”和“收入”是數據集的列。 我想在數據集中添加一個新列,其值應根據上述函數進行計算。 我嘗試了不同的步驟,但沒有任何效果。
示例:如果我們有如下一行
university_name Indian Institute of Technology Bombay
teaching 43.8
international 14.3
research 24.2
citations 8,327
income 14.9
Total Score Ranking
然后,總分應計算為
Total Score = .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.
這應適用於數據集中的所有行。
誰能幫我實現這一要求。 我在這個問題上停留了一段時間。 :-(
Indian_univ.head(10).to_dict()
{'citations': {510: 38.799999999999997,
832: 39.0,
856: 45.600000000000001,
959: 45.799999999999997,
1232: 84.700000000000003,
1360: 38.5,
1361: 41.799999999999997,
1362: 35.299999999999997,
1363: 53.600000000000001,
1679: 51.600000000000001},
'country': {510: 'India',
832: 'India',
856: 'India',
959: 'India',
1232: 'India',
1360: 'India',
1361: 'India',
1362: 'India',
1363: 'India',
1679: 'India'},
'female_male_ratio': {510: '16 : 84',
832: '15 : 85',
856: '16 : 84',
959: '17 : 83',
1232: '46 : 54',
1360: '18 : 82',
1361: '13 : 87',
1362: '15 : 85',
1363: '17 : 83',
1679: '19 : 81'},
'income': {510: '24.2',
832: '72.4',
856: '52.7',
959: '70.4',
1232: '28.4',
1360: '-',
1361: '42.4',
1362: '-',
1363: '64.8',
1679: '37.9'},
'international': {510: '14.3',
832: '16.1',
856: '19.9',
959: '15.6',
1232: '29.3',
1360: '15.3',
1361: '17.3',
1362: '14.7',
1363: '15.6',
1679: '18.2'},
'international_students': {510: '1%',
832: '0%',
856: '1%',
959: '1%',
1232: '1%',
1360: '1%',
1361: '0%',
1362: '0%',
1363: '1%',
1679: '1%'},
'num_students': {510: '8,327',
832: '9,928',
856: '8,327',
959: '8,061',
1232: '16,691',
1360: '8,371',
1361: '6,167',
1362: '9,928',
1363: '8,061',
1679: '3,318'},
'research': {510: 15.699999999999999,
832: 45.299999999999997,
856: 33.100000000000001,
959: 13.699999999999999,
1232: 14.0,
1360: 23.0,
1361: 25.199999999999999,
1362: 30.0,
1363: 12.300000000000001,
1679: 39.5},
'student_staff_ratio': {510: 14.9,
832: 17.5,
856: 14.9,
959: 18.699999999999999,
1232: 23.899999999999999,
1360: 17.300000000000001,
1361: 12.199999999999999,
1362: 17.5,
1363: 18.699999999999999,
1679: 8.1999999999999993},
'teaching': {510: 43.799999999999997,
832: 44.200000000000003,
856: 47.299999999999997,
959: 30.399999999999999,
1232: 25.800000000000001,
1360: 33.799999999999997,
1361: 31.300000000000001,
1362: 39.299999999999997,
1363: 25.100000000000001,
1679: 32.600000000000001},
'total_score': {510: 29.489999999999995,
832: 38.549999999999997,
856: 37.799999999999997,
959: 26.969999999999999,
1232: 37.350000000000001,
1360: 28.589999999999996,
1361: 29.489999999999998,
1362: 31.379999999999995,
1363: 27.299999999999997,
1679: 37.109999999999999},
'university_name': {510: 'Indian Institute of Technology Bombay',
832: 'Indian Institute of Technology Kharagpur',
856: 'Indian Institute of Technology Bombay',
959: 'Indian Institute of Technology Roorkee',
1232: 'Panjab University',
1360: 'Indian Institute of Technology Delhi',
1361: 'Indian Institute of Technology Kanpur',
1362: 'Indian Institute of Technology Kharagpur',
1363: 'Indian Institute of Technology Roorkee',
1679: 'Indian Institute of Science'},
'world_rank': {510: '301-350',
832: '226-250',
856: '251-275',
959: '351-400',
1232: '226-250',
1360: '351-400',
1361: '351-400',
1362: '351-400',
1363: '351-400',
1679: '276-300'},
'year': {510: 2012,
832: 2013,
856: 2013,
959: 2013,
1232: 2014,
1360: 2014,
1361: 2014,
1362: 2014,
1363: 2014,
1679: 2015}}
我認為您可以使用:
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
如果需要apply
功能,通常會比較慢:
def cal_score(x):
return .3 **x.research +
.3 **x.citations +
.3 **x.teaching +
.075 **x.international +
.025 **x.income
df['Total Score'] = df.apply(cal_score, axis=1)
編輯數據:
您需要先replace
num_students
和income
列,然后按astype
轉換為float
:
EDIT2按數據樣本:
import pandas as pd
df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')
#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] =
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
print (df)
citations country female_male_ratio income international \
510 38.8 India 16 : 84 24.2 14.3
832 39.0 India 15 : 85 72.4 16.1
856 45.6 India 16 : 84 52.7 19.9
959 45.8 India 17 : 83 70.4 15.6
1232 84.7 India 46 : 54 28.4 29.3
1360 38.5 India 18 : 82 0.0 15.3
1361 41.8 India 13 : 87 42.4 17.3
1362 35.3 India 15 : 85 0.0 14.7
1363 53.6 India 17 : 83 64.8 15.6
1679 51.6 India 19 : 81 37.9 18.2
international_students num_students research student_staff_ratio \
510 1% 8327 15.7 14.9
832 0% 9928 45.3 17.5
856 1% 8327 33.1 14.9
959 1% 8061 13.7 18.7
1232 1% 16691 14.0 23.9
1360 1% 8371 23.0 17.3
1361 0% 6167 25.2 12.2
1362 0% 9928 30.0 17.5
1363 1% 8061 12.3 18.7
1679 1% 3318 39.5 8.2
teaching total_score university_name \
510 43.8 29.49 Indian Institute of Technology Bombay
832 44.2 38.55 Indian Institute of Technology Kharagpur
856 47.3 37.80 Indian Institute of Technology Bombay
959 30.4 26.97 Indian Institute of Technology Roorkee
1232 25.8 37.35 Panjab University
1360 33.8 28.59 Indian Institute of Technology Delhi
1361 31.3 29.49 Indian Institute of Technology Kanpur
1362 39.3 31.38 Indian Institute of Technology Kharagpur
1363 25.1 27.30 Indian Institute of Technology Roorkee
1679 32.6 37.11 Indian Institute of Science
world_rank year Total Score
510 301-350 2012 6.177371e-09
832 226-250 2013 7.776087e-19
856 251-275 2013 4.928529e-18
959 351-400 2013 6.863746e-08
1232 226-250 2014 4.782972e-08
1360 351-400 2014 1.000000e+00
1361 351-400 2014 6.664022e-14
1362 351-400 2014 1.000000e+00
1363 351-400 2014 3.703322e-07
1679 276-300 2015 9.003721e-18
這是最直接的方法:
df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.