[英]Update muliple column values based on condition in python
I have a dataframe like this,我有一个这样的数据框,
ID 00:00 01:00 02:00 ... 23:00 avg_value
22 4.7 5.3 6 ... 8 5.5
37 0 9.2 4.5 ... 11.2 9.2
4469 2 9.8 11 ... 2 6.4
Can I use np.where
to apply conditions on multiple columns at once?我可以使用np.where
一次在多列上应用条件吗? I want to update the values from 00:00 to 23:00 to 0
and 1
.我想将值从 00:00 更新到 23:00 到0
和1
。 If the value at the time of day is greater than avg_value
then I change it to 1
, else to 0
.如果一天中的值大于avg_value
那么我将其更改为1
,否则更改为0
。
I know how to apply this method to one single column.我知道如何将此方法应用于单列。
np.where(df['00:00']>df['avg_value'],1,0)
Can I change it to multiple columns?我可以将其更改为多列吗?
Output will be like,输出会像,
ID 00:00 01:00 02:00 ... 23:00 avg_value
22 0 1 1 ... 1 5.5
37 0 0 0 ... 1 9.2
4469 0 1 1 ... 0 6.4
Select all columns without last by DataFrame.iloc
, compare by DataFrame.gt
and casting to integer
s and last add avg_value
column by DataFrame.join
:选择所有列,而不上次DataFrame.iloc
,比较受DataFrame.gt
和铸造到integer
秒和最后加avg_value
柱DataFrame.join
:
df = df.iloc[:, :-1].gt(df['avg_value'], axis=0).astype(int).join(df['avg_value'])
print (df)
00:00 01:00 02:00 23:00 avg_value
ID
22 0 0 1 1 5.5
37 0 0 0 1 9.2
4469 0 1 1 0 6.4
Or use DataFrame.pop
for extract column:或者使用DataFrame.pop
提取列:
s = df.pop('avg_value')
df = df.gt(s, axis=0).astype(int).join(s)
print (df)
00:00 01:00 02:00 23:00 avg_value
ID
22 0 0 1 1 5.5
37 0 0 0 1 9.2
4469 0 1 1 0 6.4
Because if assign to same columns integers are converted to floats (it is bug):因为如果分配给相同的列整数将转换为浮点数(这是错误):
df.iloc[:, :-1] = df.iloc[:, :-1].gt(df['avg_value'], axis=0).astype(int)
print (df)
00:00 01:00 02:00 23:00 avg_value
ID
22 0.0 0.0 1.0 1.0 5.5
37 0.0 0.0 0.0 1.0 9.2
4469 0.0 1.0 1.0 0.0 6.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.