简体   繁体   English

如何轻松删除熊猫数据框中的特殊行

[英]How can I remove the special lines in data frame of pandas in an easy way

I have a dataframe of pandas in python. 我在python中有一个熊猫数据框。 I want to remove the line in three conditions.First, column 1 to 6 and 10 to 15 are 'NA' in the line. 我想在三种情况下删除该行。首先,第1至6列和10至15列是该行中的'NA'。 Second, column 1 to 3 and 7 to 12 and 16 to 18 are 'NA'. 其次,第1至3列和7至12列以及16至18列为“ NA”。 Third, colum 4 to 9 and 13 to 18 are 'NA'. 第三,第4列至第9列和第13列至第18列为“ NA”。 I wrote the code to fix it, but it didn't work. 我写了代码来修复它,但是没有用。 The code is as follows: 代码如下:

 data = pd.read_csv('data(2).txt',sep = "\\t",index_col = 'tracking_id') num = len(data) + 1 for i in range(num): if (data.iloc[i,[0:5,9:14]] == 'NA') | (data.iloc[i,[0:11,15:17]] == 'NA)'\\ | (data.iloc[i,[3:8,12:17]] == 'NA'): data = data.drop(data.index[i], axis = 0) 
The data is in the link: enter link description here 数据在链接中: 在此处输入链接描述

You can use: 您可以使用:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,18)))

df.iloc[0, np.r_[0:5,9:14]] = np.nan
df.iloc[2, np.r_[0:11,15:17]] = np.nan
df.iloc[3:5, np.r_[3:8,12:17]] = np.nan
print (df)
    0    1    2    3    4    5    6    7    8    9    10   11   12   13   14  \
0  NaN  NaN  NaN  NaN  NaN  0.0  4.0  2.0  5.0  NaN  NaN  NaN  NaN  NaN  8.0   
1  6.0  2.0  4.0  1.0  5.0  3.0  4.0  4.0  3.0  7.0  1.0  1.0  7.0  7.0  0.0   
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  2.0  5.0  1.0  8.0   
3  2.0  8.0  3.0  NaN  NaN  NaN  NaN  NaN  3.0  4.0  7.0  6.0  NaN  NaN  NaN   
4  7.0  6.0  6.0  NaN  NaN  NaN  NaN  NaN  6.0  6.0  0.0  7.0  NaN  NaN  NaN   

    15   16  17  
0  4.0  0.0   9  
1  2.0  9.0   9  
2  NaN  NaN   4  
3  NaN  NaN   5  
4  NaN  NaN   4  

First check if values are NaN by isnull , then select by numpy.r_ and iloc and compare with all for check if all valueas are True per row. 首先通过isnull检查值是否为NaN ,然后通过numpy.r_iloc选择并与all进行比较,以检查每行是否所有valueas为True Then build main mask with | 然后用| (or). (要么)。

Last filter by boolean indexing with inverted condition by ~ : 通过boolean indexing最后一个过滤器,条件为~

mask = df.isnull()
m1 = mask.iloc[:, np.r_[0:5,9:14]].all(1)
m2 = mask.iloc[:, np.r_[0:11,15:17]].all(1)
m3 = mask.iloc[:, np.r_[3:8,12:17]].all(1)
m = m1 | m2 | m3
print (m)
0     True
1    False
2     True
3     True
4     True
dtype: bool

df = df[~m]
print (df)
    0    1    2    3    4    5    6    7    8    9    10   11   12   13   14  \
1  6.0  2.0  4.0  1.0  5.0  3.0  4.0  4.0  3.0  7.0  1.0  1.0  7.0  7.0  0.0   

    15   16  17  
1  2.0  9.0   9  
list_of_row_to_be_deleted=[1,2]
df.drop(df.index[[list_of_row_to_be_deleted]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 pandas 数据框中堆叠滚动的 n 行? - How can I stack a rolling n lines in pandas data frame? 如何从熊猫数据框中删除带括号的特殊字符 - How to remove special character with brackets from pandas data frame 如何以某种方式创建 pandas 数据框? - How can I create a pandas data frame in a certain way? 如何根据特定字符删除熊猫数据框中的行 - How to remove lines in pandas data frame based on specific character 如何仅删除数据框中一列的特殊字符? - How can I remove special characters for just one column in a data frame? Python Pandas 数据框:一列包含特殊的 HTML 特殊字符,例如 & < 有没有办法删除它们? - Python Pandas Data Frame: One column contains special HTML spcial characters such as & < Is there a way to remove them? 如何快速轻松地在 Pandas 数据框中按 column[0] 的值选择单行 - How do I select a single row by value of column[0] in pandas data frame fast and easy 如何存储熊猫数据框列表以便于访问 - How to store list of Pandas data frame for easy access 如何使用 python 删除从 pandas 数据框转换而来的列表中包含空元素的行? - How to remove lines with empty elements within a lists converted from a pandas data frame using python? 如何将熊猫数据框的第 n 行提取为熊猫数据框? - How can I extract the nth row of a pandas data frame as a pandas data frame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM