[英]Implementing functions with dataframes in python
I have this problem where I am stuck for quite a number of days.
我有这个问题,我被困了很多天。
I have this function : 我有这个功能:
def cal_score(research, citations, teaching, international, income):
return .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income
where “research”, “citations”, “teaching”, “international” and “income” are columns of the dataset. 其中“研究”,“引用”,“教学”,“国际”和“收入”是数据集的列。 I want to add a new column in the dataset whose values should be calculated based on the function mentioned above.
我想在数据集中添加一个新列,其值应根据上述函数进行计算。 I tried different procedures but none worked.
我尝试了不同的步骤,但没有任何效果。
Example : If we have a row as below 示例:如果我们有如下一行
university_name Indian Institute of Technology Bombay
teaching 43.8
international 14.3
research 24.2
citations 8,327
income 14.9
Total Score Ranking
Then the total score should be calculated as 然后,总分应计算为
Total Score = .3 **research + .3 **citations + .3 **teaching +.075 **international + .025 **income.
This should apply for all the rows in the dataset. 这应适用于数据集中的所有行。
Can anyone please help me in implementing this requirement. 谁能帮我实现这一要求。 I am stuck at this for quite sometime now.
我在这个问题上停留了一段时间。 :-(
:-(
Indian_univ.head(10).to_dict() Indian_univ.head(10).to_dict()
{'citations': {510: 38.799999999999997,
832: 39.0,
856: 45.600000000000001,
959: 45.799999999999997,
1232: 84.700000000000003,
1360: 38.5,
1361: 41.799999999999997,
1362: 35.299999999999997,
1363: 53.600000000000001,
1679: 51.600000000000001},
'country': {510: 'India',
832: 'India',
856: 'India',
959: 'India',
1232: 'India',
1360: 'India',
1361: 'India',
1362: 'India',
1363: 'India',
1679: 'India'},
'female_male_ratio': {510: '16 : 84',
832: '15 : 85',
856: '16 : 84',
959: '17 : 83',
1232: '46 : 54',
1360: '18 : 82',
1361: '13 : 87',
1362: '15 : 85',
1363: '17 : 83',
1679: '19 : 81'},
'income': {510: '24.2',
832: '72.4',
856: '52.7',
959: '70.4',
1232: '28.4',
1360: '-',
1361: '42.4',
1362: '-',
1363: '64.8',
1679: '37.9'},
'international': {510: '14.3',
832: '16.1',
856: '19.9',
959: '15.6',
1232: '29.3',
1360: '15.3',
1361: '17.3',
1362: '14.7',
1363: '15.6',
1679: '18.2'},
'international_students': {510: '1%',
832: '0%',
856: '1%',
959: '1%',
1232: '1%',
1360: '1%',
1361: '0%',
1362: '0%',
1363: '1%',
1679: '1%'},
'num_students': {510: '8,327',
832: '9,928',
856: '8,327',
959: '8,061',
1232: '16,691',
1360: '8,371',
1361: '6,167',
1362: '9,928',
1363: '8,061',
1679: '3,318'},
'research': {510: 15.699999999999999,
832: 45.299999999999997,
856: 33.100000000000001,
959: 13.699999999999999,
1232: 14.0,
1360: 23.0,
1361: 25.199999999999999,
1362: 30.0,
1363: 12.300000000000001,
1679: 39.5},
'student_staff_ratio': {510: 14.9,
832: 17.5,
856: 14.9,
959: 18.699999999999999,
1232: 23.899999999999999,
1360: 17.300000000000001,
1361: 12.199999999999999,
1362: 17.5,
1363: 18.699999999999999,
1679: 8.1999999999999993},
'teaching': {510: 43.799999999999997,
832: 44.200000000000003,
856: 47.299999999999997,
959: 30.399999999999999,
1232: 25.800000000000001,
1360: 33.799999999999997,
1361: 31.300000000000001,
1362: 39.299999999999997,
1363: 25.100000000000001,
1679: 32.600000000000001},
'total_score': {510: 29.489999999999995,
832: 38.549999999999997,
856: 37.799999999999997,
959: 26.969999999999999,
1232: 37.350000000000001,
1360: 28.589999999999996,
1361: 29.489999999999998,
1362: 31.379999999999995,
1363: 27.299999999999997,
1679: 37.109999999999999},
'university_name': {510: 'Indian Institute of Technology Bombay',
832: 'Indian Institute of Technology Kharagpur',
856: 'Indian Institute of Technology Bombay',
959: 'Indian Institute of Technology Roorkee',
1232: 'Panjab University',
1360: 'Indian Institute of Technology Delhi',
1361: 'Indian Institute of Technology Kanpur',
1362: 'Indian Institute of Technology Kharagpur',
1363: 'Indian Institute of Technology Roorkee',
1679: 'Indian Institute of Science'},
'world_rank': {510: '301-350',
832: '226-250',
856: '251-275',
959: '351-400',
1232: '226-250',
1360: '351-400',
1361: '351-400',
1362: '351-400',
1363: '351-400',
1679: '276-300'},
'year': {510: 2012,
832: 2013,
856: 2013,
959: 2013,
1232: 2014,
1360: 2014,
1361: 2014,
1362: 2014,
1363: 2014,
1679: 2015}}
I think you can use: 我认为您可以使用:
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
If need apply
function, what is very often slowier: 如果需要
apply
功能,通常会比较慢:
def cal_score(x):
return .3 **x.research +
.3 **x.citations +
.3 **x.teaching +
.075 **x.international +
.025 **x.income
df['Total Score'] = df.apply(cal_score, axis=1)
EDIT with data: 编辑数据:
You need first replace
columns num_students
and income
and then convert to float
by astype
: 您需要先
replace
num_students
和income
列,然后按astype
转换为float
:
EDIT2 by sample of data: EDIT2按数据样本:
import pandas as pd
df = pd.DataFrame({'citations': {510: 38.799999999999997, 832: 39.0, 856: 45.600000000000001, 959: 45.799999999999997, 1232: 84.700000000000003, 1360: 38.5, 1361: 41.799999999999997, 1362: 35.299999999999997, 1363: 53.600000000000001, 1679: 51.600000000000001}, 'country': {510: 'India', 832: 'India', 856: 'India', 959: 'India', 1232: 'India', 1360: 'India', 1361: 'India', 1362: 'India', 1363: 'India', 1679: 'India'}, 'female_male_ratio': {510: '16 : 84', 832: '15 : 85', 856: '16 : 84', 959: '17 : 83', 1232: '46 : 54', 1360: '18 : 82', 1361: '13 : 87', 1362: '15 : 85', 1363: '17 : 83', 1679: '19 : 81'}, 'income': {510: '24.2', 832: '72.4', 856: '52.7', 959: '70.4', 1232: '28.4', 1360: '-', 1361: '42.4', 1362: '-', 1363: '64.8', 1679: '37.9'}, 'international': {510: '14.3', 832: '16.1', 856: '19.9', 959: '15.6', 1232: '29.3', 1360: '15.3', 1361: '17.3', 1362: '14.7', 1363: '15.6', 1679: '18.2'}, 'international_students': {510: '1%', 832: '0%', 856: '1%', 959: '1%', 1232: '1%', 1360: '1%', 1361: '0%', 1362: '0%', 1363: '1%', 1679: '1%'}, 'num_students': {510: '8,327', 832: '9,928', 856: '8,327', 959: '8,061', 1232: '16,691', 1360: '8,371', 1361: '6,167', 1362: '9,928', 1363: '8,061', 1679: '3,318'}, 'research': {510: 15.699999999999999, 832: 45.299999999999997, 856: 33.100000000000001, 959: 13.699999999999999, 1232: 14.0, 1360: 23.0, 1361: 25.199999999999999, 1362: 30.0, 1363: 12.300000000000001, 1679: 39.5}, 'student_staff_ratio': {510: 14.9, 832: 17.5, 856: 14.9, 959: 18.699999999999999, 1232: 23.899999999999999, 1360: 17.300000000000001, 1361: 12.199999999999999, 1362: 17.5, 1363: 18.699999999999999, 1679: 8.1999999999999993}, 'teaching': {510: 43.799999999999997, 832: 44.200000000000003, 856: 47.299999999999997, 959: 30.399999999999999, 1232: 25.800000000000001, 1360: 33.799999999999997, 1361: 31.300000000000001, 1362: 39.299999999999997, 1363: 25.100000000000001, 1679: 32.600000000000001}, 'total_score': {510: 29.489999999999995, 832: 38.549999999999997, 856: 37.799999999999997, 959: 26.969999999999999, 1232: 37.350000000000001, 1360: 28.589999999999996, 1361: 29.489999999999998, 1362: 31.379999999999995, 1363: 27.299999999999997, 1679: 37.109999999999999}, 'university_name': {510: 'Indian Institute of Technology Bombay', 832: 'Indian Institute of Technology Kharagpur', 856: 'Indian Institute of Technology Bombay', 959: 'Indian Institute of Technology Roorkee', 1232: 'Panjab University', 1360: 'Indian Institute of Technology Delhi', 1361: 'Indian Institute of Technology Kanpur', 1362: 'Indian Institute of Technology Kharagpur', 1363: 'Indian Institute of Technology Roorkee', 1679: 'Indian Institute of Science'}, 'world_rank': {510: '301-350', 832: '226-250', 856: '251-275', 959: '351-400', 1232: '226-250', 1360: '351-400', 1361: '351-400', 1362: '351-400', 1363: '351-400', 1679: '276-300'}, 'year': {510: 2012, 832: 2013, 856: 2013, 959: 2013, 1232: 2014, 1360: 2014, 1361: 2014, 1362: 2014, 1363: 2014, 1679: 2015}})
#replace , to empty string
df['num_students'] = df.num_students.str.replace(',', '')
#replace - to '0'
df['income'] = df['income'].str.replace('-', '0')
#convert columns to float
df[['teaching', 'international', 'research', 'citations', 'income']] =
df[['teaching', 'international', 'research', 'citations', 'income']].astype(float)
df['Total Score'] = .3 **df.research +
.3 **df.citations +
.3 **df.teaching +
.075 **df.international +
.025 **df.income
print (df)
citations country female_male_ratio income international \
510 38.8 India 16 : 84 24.2 14.3
832 39.0 India 15 : 85 72.4 16.1
856 45.6 India 16 : 84 52.7 19.9
959 45.8 India 17 : 83 70.4 15.6
1232 84.7 India 46 : 54 28.4 29.3
1360 38.5 India 18 : 82 0.0 15.3
1361 41.8 India 13 : 87 42.4 17.3
1362 35.3 India 15 : 85 0.0 14.7
1363 53.6 India 17 : 83 64.8 15.6
1679 51.6 India 19 : 81 37.9 18.2
international_students num_students research student_staff_ratio \
510 1% 8327 15.7 14.9
832 0% 9928 45.3 17.5
856 1% 8327 33.1 14.9
959 1% 8061 13.7 18.7
1232 1% 16691 14.0 23.9
1360 1% 8371 23.0 17.3
1361 0% 6167 25.2 12.2
1362 0% 9928 30.0 17.5
1363 1% 8061 12.3 18.7
1679 1% 3318 39.5 8.2
teaching total_score university_name \
510 43.8 29.49 Indian Institute of Technology Bombay
832 44.2 38.55 Indian Institute of Technology Kharagpur
856 47.3 37.80 Indian Institute of Technology Bombay
959 30.4 26.97 Indian Institute of Technology Roorkee
1232 25.8 37.35 Panjab University
1360 33.8 28.59 Indian Institute of Technology Delhi
1361 31.3 29.49 Indian Institute of Technology Kanpur
1362 39.3 31.38 Indian Institute of Technology Kharagpur
1363 25.1 27.30 Indian Institute of Technology Roorkee
1679 32.6 37.11 Indian Institute of Science
world_rank year Total Score
510 301-350 2012 6.177371e-09
832 226-250 2013 7.776087e-19
856 251-275 2013 4.928529e-18
959 351-400 2013 6.863746e-08
1232 226-250 2014 4.782972e-08
1360 351-400 2014 1.000000e+00
1361 351-400 2014 6.664022e-14
1362 351-400 2014 1.000000e+00
1363 351-400 2014 3.703322e-07
1679 276-300 2015 9.003721e-18
这是最直接的方法:
df.assign(TotalScore=.3 **df.research + .3 **df.citations + .3 **df.teaching +.075 **df.international + .025 **df.income)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.