简体   繁体   English

如何在Python Pandas数据帧列上执行数学运算,但仅限于满足某个条件?

[英]How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?

I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. 我有一个Pandas数据框,我正在使用它,我只需要将某个列中的所有值除以大于800乘以100.换句话说,如果'credit_score'列中的值大于800,则可以假设数据是在小数点左边的两个额外位置输入的。 For example... 例如...

id    credit_score    column_b    column_c
0     750             ...         ...
1     653             ...         ...
2     741             ...         ...
3     65100           ...         ...
4     73500           ...         ...
5     565             ...         ...
6     480             ...         ...
7     78900           ...         ...
8     699             ...         ...
9     71500           ...         ...

So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. 所以我基本上想把行索引3,4,7和9的信用评分除以100,而不是其他。 I want the new, valid values to replace the old, invalid ones. 我希望新的有效值替换旧的无效值。 Alternatively, a new column such as 'credit_score_fixed' would work too. 或者,像“credit_score_fixed”这样的新列也可以使用。 I'm fairly new to Python and Pandas, so any help is much appreciated. 我是Python和Pandas的新手,所以非常感谢任何帮助。

I'd use Pandas boolean indexing : 我使用Pandas布尔索引

In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100

In [194]: df
Out[194]:
    credit_score
id
0          750.0
1          653.0
2          741.0
3          651.0
4          735.0
5          565.0
6          480.0
7          789.0
8          699.0
9          715.0

You can use mask : 你可以使用mask

df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)

Or numpy.where : numpy.where

df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)

print (df)
   id  credit_score    col   col1
0   0           750  750.0  750.0
1   1           653  653.0  653.0
2   2           741  741.0  741.0
3   3         65100  651.0  651.0
4   4         73500  735.0  735.0
5   5           565  565.0  565.0
6   6           480  480.0  480.0
7   7         78900  789.0  789.0
8   8           699  699.0  699.0
9   9         71500  715.0  715.0

You can use Series.apply . 您可以使用Series.apply It accepts a function and applies it on every element in the series. 它接受一个函数并将其应用于系列中的每个元素。 Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column. 请注意,它不在位,您需要将它返回的系列重新分配给新列或同一列。

def fix_scores(score):
    return score / 100 if score > 800 else score
    # same as
    # if score > 800:
    #      return score / 100
    # return score

df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM