从DataFrame中的特定列中选择非空行，并对其他列进行子选择

Question

我有一个dataFrame有几个coulmns，所以我选择了一些coulmns来创建一个像这样的变量xtrain = df[['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]我想从这些coulmns中删除主数据框架中的Survive coulmn所有的原始数据。

Answer 1

You can pass a boolean mask to your df based on notnull() of 'Survive' column and select the cols of interest: 您可以根据'Survive'列的notnull()将布尔掩码传递给df，并选择感兴趣的cols：

In [2]:
# make some data
df = pd.DataFrame(np.random.randn(5,7), columns= ['Survive', 'Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ])
df['Survive'].iloc[2] = np.NaN
df
Out[2]:
    Survive       Age      Fare  Group_Size      deck    Pclass     Title
0  1.174206 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  0.036843  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
2       NaN -0.132394 -0.236904   -0.324087  0.570660  0.758084 -0.176421
3 -2.145934 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.197144 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Now pass a mask to loc to take only non NaN rows: 现在将掩码传递给loc以仅获取非NaN行：

In [3]:
xtrain = df.loc[df['Survive'].notnull(), ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]
xtrain

Out[3]:
        Age      Fare  Group_Size      deck    Pclass     Title
0 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
3 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Answer 2

Two alternatives because... well why not? 两个选择因为......为什么不呢？
Both drop nan prior to column slicing. 在柱切片之前都滴下nan 。 That's two call rather than EdChum's one call. 这是两个电话，而不是EdChum的一个电话。

one 一

df.dropna(subset=['Survive'])[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

two 二

df.query('Survive == Survive')[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

Answer 3

Of course already given answers are correct. 当然已经给出了答案是正确的。 Here is a simple one liner of code that works as well. 这是一个简单的一行代码，也适用。

xtrain = df[df['survive'].notnull()][['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

从DataFrame中的特定列中选择非空行，并对其他列进行子选择

问题描述

3 个解决方案

解决方案1
12 已采纳 2016-12-27 00:09:02

解决方案2
3 2016-12-27 00:27:48

解决方案3
1 2018-06-24 11:40:52

从DataFrame中的特定列中选择非空行，并对其他列进行子选择

问题描述

3 个解决方案

解决方案1 12 已采纳 2016-12-27 00:09:02

解决方案2 3 2016-12-27 00:27:48

解决方案3 1 2018-06-24 11:40:52

解决方案1
12 已采纳 2016-12-27 00:09:02

解决方案2
3 2016-12-27 00:27:48

解决方案3
1 2018-06-24 11:40:52