[英]Add a new column to a Pandas dataframe with a value from a function
I know this is similar to other questions but I can't find a solution that I can make work.我知道这与其他问题类似,但我找不到可以解决的问题。
I have a dataframe that contains grades that looks similar to this:我有一个 dataframe,其中包含看起来与此类似的成绩:
subj1 subj2 subj3 subj4
0 A B A B
1 B B C B
2 C C B A
I want to append a GPA score in a new column so that the result is this:我想要 append 新列中的 GPA 分数,这样结果是这样的:
subj1 subj2 subj3 subj4 GPA
0 A B A B 3.5
1 B B C B 2.8
2 C D B A 2.5
the function I use to calculate the GPA is this:我用来计算 GPA 的 function 是这样的:
def calcgpa():
for row in df.itertuples(index=False):
tot = 0
c = 0
GPA = 0
for i in range(len(row)):
if row[i] == "A":
tot = tot + 4
c += 1
elif row[i] == "B":
tot = tot + 3
c += 1
elif row[i] == "C":
tot = tot + 2
c += 1
elif row[i] == "D":
tot = tot + 1
c += 1
else:
c += 1
GPA = tot / c
return GPA
I thought that df["GPA"] = pd.Series(calcgpa())
would work but it only adds a value to the first row.我认为df["GPA"] = pd.Series(calcgpa())
会起作用,但它只会向第一行添加一个值。 All others are NaN.所有其他的都是 NaN。 Trying to use pd.apply or pd.assign just gave me an AssertionError.尝试使用 pd.apply 或 pd.assign 只是给了我一个 AssertionError。
Is the problem with how the function returns the GPA or what is the proper syntax I need to add the new column?问题是 function 如何返回 GPA 还是我需要添加新列的正确语法是什么?
Assuming you only have AE, if you have anything else, ensure you replace them wite zero first, you can then do:假设您只有 AE,如果您还有其他任何东西,请确保先将它们替换为零,然后您可以执行以下操作:
df['GPA'] = df.replace({'A':4,'B':3,'C':2, 'D':1, 'E':0}).mean(1) df['GPA'] = df.replace({'A':4,'B':3,'C':2, 'D':1, 'E':0}).mean(1)
df
subj1 subj2 subj3 subj4 GPA
0 A B A B 3.50
1 B B C B 2.75
2 C C B A 2.75
If you look at the output of calcgpa()
, it is a single float: 3.5
not a list of GPAs, hence why your output only gives 1 value, then Nans.如果您查看calcgpa()
的 output,它是一个浮点数: 3.5
而不是 GPA 列表,因此您的 output 只给出 1 个值,然后是 Nans。
I would suggest for your code you need to store each GPA value to a list, and assign that as the column instead.我建议您的代码需要将每个 GPA 值存储到一个列表中,并将其分配为列。 This requires some small changes to your code:这需要对您的代码进行一些小的更改:
replacing GPA = 0
with GPA = []
to turn it into a list and moving this to the top of the function, outside of both for loops.将GPA = 0
替换为GPA = []
以将其转换为列表并将其移动到 function 的顶部,在两个 for 循环之外。 Then change GPA = tot/c
to GPA.append(tot / c)
to append each GPA to the list to be assigned as the new GPA column.然后将GPA = tot/c
更改为GPA.append(tot / c)
to append 每个 GPA 到要分配为新 GPA 列的列表。
Full code:完整代码:
def calcgpa():
GPA = []
for row in df.itertuples(index=False):
tot = 0
c = 0
for i in range(len(row)):
if row[i] == "A":
tot = tot + 4
c += 1
elif row[i] == "B":
tot = tot + 3
c += 1
elif row[i] == "C":
tot = tot + 2
c += 1
elif row[i] == "D":
tot = tot + 1
c += 1
else:
c += 1
GPA.append(tot / c)
return GPA
You can then assign this to the GPA column like this:然后,您可以像这样将其分配给 GPA 列:
df["GPA"] = calcgpa()
Output: Output:
subj1 subj2 subj3 subj4 GPA
0 A B A B 3.50
1 B B C B 2.75
2 C C B A 2.75
As posted in the other answer, there are more efficient ways to achieve this, but as your code was close I thought I would amend that to achieve the result正如在另一个答案中发布的那样,有更有效的方法可以实现这一点,但由于您的代码很接近,我想我会修改它以实现结果
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.