[英]Replace value in Pandas Dataframe based on condition
I have a dataframe column with some numeric values. 我有一个带有一些数值的数据帧列。 I want that these values get replaced by 1 and 0 based on a given condition. 我希望根据给定条件将这些值替换为1和0。 The condition is that if the value is above the mean of the column, then change the numeric value to 1, else set it to 0. 条件是如果该值高于列的平均值,则将数值更改为1,否则将其设置为0。
Here is the code I have now: 这是我现在的代码:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
dataset = pd.read_csv('data.csv')
dataset = dataset.dropna(axis=0, how='any')
X = dataset.drop(['myCol'], axis=1)
y = dataset.iloc[:, 4:5].values
mean_y = np.mean(dataset.myCol)
The target is the dataframe y. 目标是数据帧y。 y is like so: 你是这样的:
0
0 16
1 13
2 12.5
3 12
and so on. 等等。 mean_y is equal to 3.55. mean_y等于3.55。 Therefore, I need that all values greater than 3.55 to become ones, and the rest 0. 因此,我需要将大于3.55的所有值变为1,其余为0。
I applied this loop, but without success: 我应用了这个循环,但没有成功:
for i in dataset.myCol:
if dataset.myCol[i] > mean_y:
dataset.myCol[i] = 1
else:
dataset.myCol[i] = 0
The output is the following: 输出如下:
0
0 16
1 13
2 0
3 12
What am I doing wrong? 我究竟做错了什么? Can someone please explain me the mistake? 有人可以解释我的错误吗?
Thank you! 谢谢!
试试这种矢量化方法:
dataset.myCol = np.where(dataset.myCol > dataset.myCol.mean(), 1, 0)
Convert boolean mask to integer - True
s to 1
and False
s to 0
: 将布尔掩码转换为整数 - True
s为1
, False
为0
:
print (dataset.myCol > mean_y)
0 True
1 False
2 False
3 False
Name: myCol, dtype: bool
dataset.myCol = (dataset.myCol > mean_y).astype(int)
print (dataset)
myCol
0 1
1 0
2 0
3 0
For your aproach, not recommended because slow need iterrows
for set values by columns and index values: 为了您的形式给出,不推荐,因为慢需要iterrows
由列和指标值的设定值:
for i, x in dataset.iterrows():
if dataset.loc[i, 'myCol'] > mean_y:
dataset.loc[i, 'myCol'] = 1
else:
dataset.loc[i, 'myCol'] = 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.