[英]How do I perform a math operation on a Python Pandas dataframe column, but only if a certain condition is met?
I have a Pandas dataframe that I'm working with and I simply need to divide all values in a certain column that are greater than 800 by 100. In other words, if the value in the 'credit_score' column is greater than 800, it can be assumed that the data were entered with two extra places to the left of the decimal place. 我有一个Pandas数据框,我正在使用它,我只需要将某个列中的所有值除以大于800乘以100.换句话说,如果'credit_score'列中的值大于800,则可以假设数据是在小数点左边的两个额外位置输入的。 For example... 例如...
id credit_score column_b column_c
0 750 ... ...
1 653 ... ...
2 741 ... ...
3 65100 ... ...
4 73500 ... ...
5 565 ... ...
6 480 ... ...
7 78900 ... ...
8 699 ... ...
9 71500 ... ...
So I basically want to divide the credit scores for row indexes 3, 4, 7, and 9 by 100, but not the others. 所以我基本上想把行索引3,4,7和9的信用评分除以100,而不是其他。 I want the new, valid values to replace the old, invalid ones. 我希望新的有效值替换旧的无效值。 Alternatively, a new column such as 'credit_score_fixed' would work too. 或者,像“credit_score_fixed”这样的新列也可以使用。 I'm fairly new to Python and Pandas, so any help is much appreciated. 我是Python和Pandas的新手,所以非常感谢任何帮助。
I'd use Pandas boolean indexing : 我使用Pandas布尔索引 :
In [193]: df.loc[df.credit_score > 800, 'credit_score'] /= 100
In [194]: df
Out[194]:
credit_score
id
0 750.0
1 653.0
2 741.0
3 651.0
4 735.0
5 565.0
6 480.0
7 789.0
8 699.0
9 715.0
You can use mask
: 你可以使用mask
:
df.credit_score = df.credit_score.mask( df.credit_score > 800, df.credit_score/ 100)
Or numpy.where
: 或numpy.where
:
df.credit_score = np.where( df.credit_score > 800, df.credit_score/ 100, df.credit_score)
print (df)
id credit_score col col1
0 0 750 750.0 750.0
1 1 653 653.0 653.0
2 2 741 741.0 741.0
3 3 65100 651.0 651.0
4 4 73500 735.0 735.0
5 5 565 565.0 565.0
6 6 480 480.0 480.0
7 7 78900 789.0 789.0
8 8 699 699.0 699.0
9 9 71500 715.0 715.0
You can use Series.apply
. 您可以使用Series.apply
。 It accepts a function and applies it on every element in the series. 它接受一个函数并将其应用于系列中的每个元素。 Note that it is not inplace and you will in need to reassign the series that it returns, either to a new column or to the same column. 请注意,它不在位,您需要将它返回的系列重新分配给新列或同一列。
def fix_scores(score):
return score / 100 if score > 800 else score
# same as
# if score > 800:
# return score / 100
# return score
df['credit_score_fixed'] = df['credit_score'].apply(fix_scores)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.