[英]How to remove columns after any row has a NaN value in Python pandas dataframe
Let's say I have following DataFrame
:假设我有以下
DataFrame
:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[11,21,31], "B":[12,22,32], "C":[np.nan,23,33], "D":[np.nan,24,34], "E":[15,25,35]})
Which would return:哪个会返回:
>>> df
A B C D E
0 11 12 NaN NaN 15
1 21 22 23.0 24.0 25
2 31 32 33.0 34.0 35
nan
valuesnan
值的列I know how to remove all the columns which have any row with a nan
value like this:我知道如何删除所有具有
nan
值的行的列,如下所示:
out1 = df.dropna(axis=1, how="any")
Which returns:哪个返回:
>>> out1
A B E
0 11 12 15
1 21 22 25
2 31 32 35
However what I expect is to remove all columns after a nan
value is found.但是,我希望在找到
nan
值后删除所有列。 In the toy example code the expected output would be:在玩具示例代码中,预期的 output 将是:
A B
0 11 12
1 21 22
2 31 32
How can I remove all columns after a nan
is found within any row in a pandas
DataFrame
?在
pandas
DataFrame
的任何行中找到nan
后,如何删除所有列?
What I would do:我会做什么:
any
for every column, across the rowsany
df.loc[:, ~df.isna().cumsum(axis=1).any(axis=0)]
Give me:给我吗:
A B
0 11 12
1 21 22
2 31 32
I could find a way as follows to get the expected output:我可以找到如下方法来获得预期的 output:
colFirstNaN = df.isna().any(axis=0).idxmax() # Find column that has first NaN element in any row
indexColLastValue = df.columns.tolist().index(colFirstNaN) -1
ColLastValue = df.columns[indexColLastValue]
out2 = df.loc[:, :ColLastValue]
And the output would be then:然后 output 将是:
>>> out2
A B
0 11 12
1 21 22
2 31 32
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.