简体   繁体   English

将新列添加到 Pandas dataframe,其值来自 function

[英]Add a new column to a Pandas dataframe with a value from a function

I know this is similar to other questions but I can't find a solution that I can make work.我知道这与其他问题类似,但我找不到可以解决的问题。

I have a dataframe that contains grades that looks similar to this:我有一个 dataframe,其中包含看起来与此类似的成绩:

  subj1 subj2 subj3 subj4
0   A     B     A     B
1   B     B     C     B
2   C     C     B     A

I want to append a GPA score in a new column so that the result is this:我想要 append 新列中的 GPA 分数,这样结果是这样的:

  subj1 subj2 subj3 subj4 GPA
0   A     B     A     B   3.5
1   B     B     C     B   2.8
2   C     D     B     A   2.5

the function I use to calculate the GPA is this:我用来计算 GPA 的 function 是这样的:

def calcgpa():
    for row in df.itertuples(index=False):
        tot = 0
        c = 0
        GPA = 0
        for i in range(len(row)):
            if row[i] == "A":
                tot = tot + 4
                c += 1
            elif row[i] == "B":
                tot = tot + 3
                c += 1
            elif row[i] == "C":
                tot = tot + 2
                c += 1
            elif row[i] == "D":
                tot = tot + 1
                c += 1
            else:
                c += 1
        GPA = tot / c
        return GPA

I thought that df["GPA"] = pd.Series(calcgpa()) would work but it only adds a value to the first row.我认为df["GPA"] = pd.Series(calcgpa())会起作用,但它只会向第一行添加一个值。 All others are NaN.所有其他的都是 NaN。 Trying to use pd.apply or pd.assign just gave me an AssertionError.尝试使用 pd.apply 或 pd.assign 只是给了我一个 AssertionError。

Is the problem with how the function returns the GPA or what is the proper syntax I need to add the new column?问题是 function 如何返回 GPA 还是我需要添加新列的正确语法是什么?

Assuming you only have AE, if you have anything else, ensure you replace them wite zero first, you can then do:假设您只有 AE,如果您还有其他任何东西,请确保先将它们替换为零,然后您可以执行以下操作:

df['GPA'] = df.replace({'A':4,'B':3,'C':2, 'D':1, 'E':0}).mean(1) df['GPA'] = df.replace({'A':4,'B':3,'C':2, 'D':1, 'E':0}).mean(1)

df 
  subj1 subj2 subj3 subj4   GPA
0     A     B     A     B  3.50
1     B     B     C     B  2.75
2     C     C     B     A  2.75

If you look at the output of calcgpa() , it is a single float: 3.5 not a list of GPAs, hence why your output only gives 1 value, then Nans.如果您查看calcgpa()的 output,它是一个浮点数: 3.5而不是 GPA 列表,因此您的 output 只给出 1 个值,然后是 Nans。

I would suggest for your code you need to store each GPA value to a list, and assign that as the column instead.我建议您的代码需要将每个 GPA 值存储到一个列表中,并将其分配为列。 This requires some small changes to your code:这需要对您的代码进行一些小的更改:

replacing GPA = 0 with GPA = [] to turn it into a list and moving this to the top of the function, outside of both for loops.GPA = 0替换为GPA = []以将其转换为列表并将其移动到 function 的顶部,在两个 for 循环之外。 Then change GPA = tot/c to GPA.append(tot / c) to append each GPA to the list to be assigned as the new GPA column.然后将GPA = tot/c更改为GPA.append(tot / c) to append 每个 GPA 到要分配为新 GPA 列的列表。

Full code:完整代码:

def calcgpa():
    GPA = []
    for row in df.itertuples(index=False):
        tot = 0
        c = 0
        for i in range(len(row)):
            if row[i] == "A":
                tot = tot + 4
                c += 1
            elif row[i] == "B":
                tot = tot + 3
                c += 1
            elif row[i] == "C":
                tot = tot + 2
                c += 1
            elif row[i] == "D":
                tot = tot + 1
                c += 1
            else:
                c += 1
        GPA.append(tot / c)
    return GPA

You can then assign this to the GPA column like this:然后,您可以像这样将其分配给 GPA 列:

df["GPA"] = calcgpa()

Output: Output:

  subj1 subj2 subj3 subj4   GPA
0     A     B     A     B  3.50
1     B     B     C     B  2.75
2     C     C     B     A  2.75

As posted in the other answer, there are more efficient ways to achieve this, but as your code was close I thought I would amend that to achieve the result正如在另一个答案中发布的那样,有更有效的方法可以实现这一点,但由于您的代码很接近,我想我会修改它以实现结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:添加新列并按条件从另一个dataframe赋值 - Pandas: Add new column and assigning value from another dataframe by condition 无法从 function 向 pandas dataframe 添加新列 - unable to add a new column to a pandas dataframe from within a function Pandas Dataframe - 添加具有另一行值的新列 - Pandas Dataframe - Add a new Column with value from another row 使用空字符串或A列中的值(取决于B列中的值)在pandas数据框中添加新列 - Add new column in pandas dataframe using empty string or the value from column A depending on the value on column B 在 dataframe 中查找值并在 pandas 的新列中添加先例列值 - find a value in a dataframe and add precedent column value in a new column in pandas 向 pandas dataframe 添加一个新列,其中包含来自另一列的转换值? - Add a new column to pandas dataframe with coverted values from another column? 熊猫:在数据框的最后一行添加一个具有单个值的新列 - Pandas: add a new column with one single value at the last row of a dataframe Pandas - 将特定 iloc 的值添加到新的数据框列中 - Pandas - add value at specific iloc into new dataframe column 在 Pandas dataframe 中找到最小值并在新列上添加 label - Find the minimum value in a Pandas dataframe and add a label on new column Python Pandas dataframe - 根据索引值添加新列 - Python Pandas dataframe - add a new column based on index value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM