[英]How can I remove the special lines in data frame of pandas in an easy way
I have a dataframe of pandas in python. 我在python中有一个熊猫数据框。 I want to remove the line in three conditions.First, column 1 to 6 and 10 to 15 are 'NA' in the line.
我想在三种情况下删除该行。首先,第1至6列和10至15列是该行中的'NA'。 Second, column 1 to 3 and 7 to 12 and 16 to 18 are 'NA'.
其次,第1至3列和7至12列以及16至18列为“ NA”。 Third, colum 4 to 9 and 13 to 18 are 'NA'.
第三,第4列至第9列和第13列至第18列为“ NA”。 I wrote the code to fix it, but it didn't work.
我写了代码来修复它,但是没有用。 The code is as follows:
代码如下:
data = pd.read_csv('data(2).txt',sep = "\\t",index_col = 'tracking_id') num = len(data) + 1 for i in range(num): if (data.iloc[i,[0:5,9:14]] == 'NA') | (data.iloc[i,[0:11,15:17]] == 'NA)'\\ | (data.iloc[i,[3:8,12:17]] == 'NA'): data = data.drop(data.index[i], axis = 0)
You can use: 您可以使用:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,18)))
df.iloc[0, np.r_[0:5,9:14]] = np.nan
df.iloc[2, np.r_[0:11,15:17]] = np.nan
df.iloc[3:5, np.r_[3:8,12:17]] = np.nan
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 \
0 NaN NaN NaN NaN NaN 0.0 4.0 2.0 5.0 NaN NaN NaN NaN NaN 8.0
1 6.0 2.0 4.0 1.0 5.0 3.0 4.0 4.0 3.0 7.0 1.0 1.0 7.0 7.0 0.0
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 5.0 1.0 8.0
3 2.0 8.0 3.0 NaN NaN NaN NaN NaN 3.0 4.0 7.0 6.0 NaN NaN NaN
4 7.0 6.0 6.0 NaN NaN NaN NaN NaN 6.0 6.0 0.0 7.0 NaN NaN NaN
15 16 17
0 4.0 0.0 9
1 2.0 9.0 9
2 NaN NaN 4
3 NaN NaN 5
4 NaN NaN 4
First check if values are NaN
by isnull
, then select by numpy.r_
and iloc
and compare with all
for check if all valueas are True
per row. 首先通过
isnull
检查值是否为NaN
,然后通过numpy.r_
和iloc
选择并与all
进行比较,以检查每行是否所有valueas为True
。 Then build main mask with |
然后用
|
(or). (要么)。
Last filter by boolean indexing
with inverted condition by ~
: 通过
boolean indexing
最后一个过滤器,条件为~
:
mask = df.isnull()
m1 = mask.iloc[:, np.r_[0:5,9:14]].all(1)
m2 = mask.iloc[:, np.r_[0:11,15:17]].all(1)
m3 = mask.iloc[:, np.r_[3:8,12:17]].all(1)
m = m1 | m2 | m3
print (m)
0 True
1 False
2 True
3 True
4 True
dtype: bool
df = df[~m]
print (df)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 \
1 6.0 2.0 4.0 1.0 5.0 3.0 4.0 4.0 3.0 7.0 1.0 1.0 7.0 7.0 0.0
15 16 17
1 2.0 9.0 9
list_of_row_to_be_deleted=[1,2]
df.drop(df.index[[list_of_row_to_be_deleted]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.