[英]Setting nan to rows in pandas dataframe based on column value
Using: 使用:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = pd.read_csv('file.csv', na_values=['-9999.0'], decimal=',')
a.index = pd.to_datetime(a[['Year', 'Month', 'Day', 'Hour', 'Minute']])
pd.options.mode.chained_assignment = None
The dataframe is something like: 数据框如下:
Index A B C D
2016-07-20 18:00:00 9 4.0 NaN 2
2016-07-20 19:00:00 9 2.64 0.0 3
2016-07-20 20:00:00 12 2.59 0.0 1
2016-07-20 21:00:00 9 4.0 NaN 2
The main objective is to set np.nan to the entire row if the value on A column is 9 and on D column is 2 at the same time, for exemple: 主要目标是如果A列上的值为9并且D列上的值同时为2,则将np.nan设置为整行:例如:
Output expectation 产出预期
Index A B C D
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9 2.64 0.0 3
2016-07-20 20:00:00 12 2.59 0.0 2
2016-07-20 21:00:00 NaN NaN NaN NaN
Would be thankful if someone could help. 如果有人可以提供帮助,将会感激不尽。
Try this: 尝试这个:
df.loc[df.A.eq(9) & df.D.eq(2)] = [np.nan] * len(df.columns)
Demo: 演示:
In [158]: df
Out[158]:
A B C D
Index
2016-07-20 18:00:00 9 4.00 NaN 2
2016-07-20 19:00:00 9 2.64 0.0 3
2016-07-20 20:00:00 12 2.59 0.0 1
2016-07-20 21:00:00 9 4.00 NaN 2
In [159]: df.loc[df.A.eq(9) & df.D.eq(2)] = [np.nan] * len(df.columns)
In [160]: df
Out[160]:
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
alternatively we can use DataFrame.where() method: 或者我们可以使用DataFrame.where()方法:
In [174]: df = df.where(~(df.A.eq(9) & df.D.eq(2)))
In [175]: df
Out[175]:
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
Use mask
, which create NaN
s by default: 使用
mask
,默认情况下创建NaN
:
df = a.mask((a['A'] == 9) & (a['D'] == 2))
print (df)
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
Or boolean indexing
with assign NaN
: 或者使用赋值
NaN
boolean indexing
:
a[(a['A'] == 9) & (a['D'] == 2)] = np.nan
print (a)
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
Timings : 时间 :
np.random.seed(123)
N = 1000000
L = list('abcdefghijklmnopqrst'.upper())
a = pd.DataFrame(np.random.choice([np.nan,2,9], size=(N,20)), columns=L)
#jez2
In [256]: %timeit a[(a['A'] == 9) & (a['D'] == 2)] = np.nan
10 loops, best of 3: 25.8 ms per loop
#jez2upr
In [257]: %timeit a.loc[(a['A'] == 9) & (a['D'] == 2)] = np.nan
10 loops, best of 3: 27.6 ms per loop
#Wen
In [258]: %timeit a.mul(np.where((a.A==9)&(a.D==2),np.nan,1),0)
10 loops, best of 3: 90.5 ms per loop
#jez1
In [259]: %timeit a.mask((a['A'] == 9) & (a['D'] == 2))
1 loop, best of 3: 316 ms per loop
#maxu2
In [260]: %timeit a.where(~(a.A.eq(9) & a.D.eq(2)))
1 loop, best of 3: 318 ms per loop
#pir1
In [261]: %timeit a.where(a.A.ne(9) | a.D.ne(2))
1 loop, best of 3: 316 ms per loop
#pir2
In [263]: %timeit a[a.A.ne(9) | a.D.ne(2)].reindex(a.index)
1 loop, best of 3: 355 ms per loop
Option 1 选项1
This is the opposite of @Jezrael's mask
solution. 这与@ Jezrael的
mask
解决方案相反。
a.where(a.A.ne(9) | a.D.ne(2))
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
Option 2 选项2
pd.DataFrame.reindex
a[a.A.ne(9) | a.D.ne(2)].reindex(a.index)
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
Or you can try using .mul
after np.where
或者你可以尝试在
.mul
之后使用np.where
a=np.where((df2.A==9)&(df2.D==2),np.nan,1)
df2.mul(a,0)
#one line df.mul(np.where((df.A==9)&(df.D==2),np.nan,1))
A B C D
Index
2016-07-20 18:00:00 NaN NaN NaN NaN
2016-07-20 19:00:00 9.0 2.64 0.0 3.0
2016-07-20 20:00:00 12.0 2.59 0.0 1.0
2016-07-20 21:00:00 NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.