简体   繁体   English

从DataFrame中的特定列中选择非空行,并对其他列进行子选择

[英]Select non-null rows from a specific column in a DataFrame and take a sub-selection of other columns

我有一个dataFrame有几个coulmns,所以我选择了一些coulmns来创建一个像这样的变量xtrain = df[['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]我想从这些coulmns中删除主数据框架中的Survive coulmn所有的原始数据。

You can pass a boolean mask to your df based on notnull() of 'Survive' column and select the cols of interest: 您可以根据'Survive'列的notnull()将布尔掩码传递给df,并选择感兴趣的cols:

In [2]:
# make some data
df = pd.DataFrame(np.random.randn(5,7), columns= ['Survive', 'Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ])
df['Survive'].iloc[2] = np.NaN
df
Out[2]:
    Survive       Age      Fare  Group_Size      deck    Pclass     Title
0  1.174206 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  0.036843  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
2       NaN -0.132394 -0.236904   -0.324087  0.570660  0.758084 -0.176421
3 -2.145934 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.197144 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Now pass a mask to loc to take only non NaN rows: 现在将掩码传递给loc以仅获取非NaN行:

In [3]:
xtrain = df.loc[df['Survive'].notnull(), ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]
xtrain

Out[3]:
        Age      Fare  Group_Size      deck    Pclass     Title
0 -0.056846  0.454437    0.496695  1.401509 -2.078731 -1.024832
1  1.060134  0.770625   -0.114912  0.118991 -0.317909  0.061022
3 -0.020003 -0.777785    0.835467  1.498284 -1.371325  0.661991
4 -0.089806 -0.706548    1.621260  1.754292  0.725897  0.860482

Two alternatives because... well why not? 两个选择因为......为什么不呢?
Both drop nan prior to column slicing. 在柱切片之前都滴下nan That's two call rather than EdChum's one call. 这是两个电话,而不是EdChum的一个电话。

one

df.dropna(subset=['Survive'])[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

two

df.query('Survive == Survive')[
    ['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

Of course already given answers are correct. 当然已经给出了答案是正确的。 Here is a simple one liner of code that works as well. 这是一个简单的一行代码,也适用。

xtrain = df[df['survive'].notnull()][['Age','Fare', 'Group_Size','deck', 'Pclass', 'Title' ]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM