简体   繁体   中英

How do I remove every row in a dataframe which has a value above a certain threshold?

I have a bankruptcy prediction dataset from https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction and am trying to repurpose it for predicting the Net Value Growth Rate. I noticed that the mean squared error is much higher than it should be, which I assume is because of the high number of wrong data.

I tried some of the approaches by the users on kaggle, but didn't see much success because I'm building a prediction model instead of a classification model (also maybe because of my limited programming skills).

I already removed the columns with only binary values and some others with too much wrong data, but I would also like to automatically remove every row, in which a value is above 1. This should improve the prediction model by a lot, because all the columns I'm using should only have values below 1 anyway.

I used the following line to do this in the column "Net Value Growth Rate"

data = data[data[' Net Value Growth Rate'].between(0, 1)

This obviously only works for that one column while the others remain untouched. Is there some way to remove every value in the dataset above 1?

尝试一个简单的面具:

data[data<=1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM