简体   繁体   English

根据列表从熊猫系列中删除行

[英]Drop rows from a pandas series based on a list

I would like to be able to generate a list of the indices NaN values in one df and then use that list to remove the corresponding rows in a series.我希望能够在一个 df 中生成索引 NaN 值的列表,然后使用该列表删除一系列中的相应行。 The goal being to have the df and the series have the same number of rows目标是让 df 和系列具有相同的行数

However I keep getting stuck on the last part.但是我一直被困在最后一部分。 If I use drop() I get not in axis error.如果我使用 drop(),则不会出现轴错误。 I have tried isin() but I dont seem to get the right results我试过 isin() 但我似乎没有得到正确的结果

#create list of index of nan values in Garageyrblt
Index_nan_train = X_train[X_train['GarageYrBlt'].isna()].index.tolist()
# drop nan in garageyrblt X_train
X_train = X_train.drop(subset = ['GarageYrBlt'], axis = 0)
# use list to drop nan in garageyrblt Y_train
y_train = y_train.drop(['Index_nan_train'], axis = 0)

Edit: To add further details the data is from the kaggle exercise for dealing with missing values from the intermediate machine learning course.编辑:要添加更多详细信息,数据来自 kaggle 练习,用于处理中级机器学习课程中的缺失值。

X_train is (1168,36) dataframe with the input features and y_train is (1168,) series which represents the Sale Price X_train 是具有输入特征的 (1168,36) 数据框,y_train 是 (1168,) 系列,表示销售价格

An easier way to do this when you're dealing with the same transformations on both, you should probably keep them together until you're done with cleaning and until you actually use the data.当您在两者上处理相同的转换时,更简单的方法是将它们放在一起,直到完成清理并实际使用数据为止。

x_cols = X_train.columns
y_col = y_train.name
all = pd.concat((X_train, y_train), axis=1)
all = all.dropna(subset=["GarageYrBlt"])

X_train, y_train = all.loc[:,x_cols], all.loc[:,y_col]

Alternatively use the difference index method :或者使用difference索引方法:

nan_idx = X_train.loc[X_train["GarageYrBlt"].isna()].index
notna_idx = X_train.index.difference(nan_idx)

y_train = y_train.loc[notna_idx]

NB : Slicing on indices will mess with you if you operate on indices with groupby , reset_index and such so be aware of that problem.注意:如果您使用groupbyreset_index等操作索引,则对索引进行切片会让您reset_index ,因此请注意该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM