简体   繁体   English

Python Pandas:根据另一列的值更新行

[英]Python pandas: Updating row based on value from another column

I have a pandas dataframe, df, like: 我有一个熊猫数据框df,例如:

name   | grade | grade_type
---------------------------
sarah  | B     | letter  
alice  | A     | letter
eliza  | C     | letter
beth   | 76    | numeral
jones  | 90    | numeral

All values in df are strings, including the numbers. df中的所有值都是字符串,包括数字。 I want to convert the grade numeric values into letters, based on checking the grade_type column, to get: 我希望根据检查grade_type列将grade数值转换为字母,以获得:

name   | grade | grade_type
---------------------------
sarah  | B     | letter  
alice  | A     | letter
eliza  | C     | letter
beth   | B     | numeral
jones  | A     | numeral

For completeness, the numeral-to-letter grade conversions are: 为了完整起见,数字到字母的等级转换是:

A: grade > 80
B: 70 < grade <= 80
C: 60 < grade <= 70

Why doesn't this work? 为什么不起作用?

for index, row in df.iterrows():
  if row.grade_type == "numeral":
    grade_val = int(row.grade.values[0])
    if grade_val > 80:
      row.grade = "A" # This assignment doesn't update row.grade!
    elif...

The alternative is using df.apply(...lambda:...) , but I'm not too sure how to pull that off, since we have to check the grade_type column before deciding whether or not to update the grade value. 另一种方法是使用df.apply(...lambda:...) ,但是我不太确定如何实现这一点,因为在决定是否更新grade值之前,我们必须检查grade_type列。

The reason that your DataFrame doesn't update is because rows returned from iterrows() : are copies. DataFrame不更新的原因是因为iterrows()返回的行是副本。 And you're working on that copy. 您正在处理该副本。

You can use the index returned from iterrows and manipulate DataFrame directly: 您可以使用从迭代返回的index并直接操作DataFrame:

for index, row in df.iterrows():
    grade_val = int(row.grade.values[0])
    if grade_val > 80:
        df.loc[index, 'grade'] = 'A'
    ...

Or as you said you can use df.apply() , and pass it a custom function: 或者如您所说,您可以使用df.apply()并将其传递给自定义函数:

def get_grades(x):
    if x['grade_type'] == 'letter':
        return(x['grade_val']) 
    if x['grade_val'] > 80:
        return "A"
    ...


df['grade'] = df.apply(lambda x: get_grades(x), axis=1)

You can also use if else in your lambda to check if x['grade_type'] is numeric as follows, use the one that looks easier to read. 您还可以使用lambda中的if else来检查x['grade_type']是否为数字,如下所示,使用看起来更容易理解的数字。

def get_grades(grade_val):
    if grade_val > 80:
        return "A"
    ...

df['grade'] = df.apply(lambda x: get_grades(x['grade']) 
                       if x['grade_type'] == 'numeral' else x['grade'], axis=1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM