How do I remove every row in a dataframe which has a value above a certain threshold?

Question

I have a bankruptcy prediction dataset from https://www.kaggle.com/fedesoriano/company-bankruptcy-prediction and am trying to repurpose it for predicting the Net Value Growth Rate. I noticed that the mean squared error is much higher than it should be, which I assume is because of the high number of wrong data.

I tried some of the approaches by the users on kaggle, but didn't see much success because I'm building a prediction model instead of a classification model (also maybe because of my limited programming skills).

I already removed the columns with only binary values and some others with too much wrong data, but I would also like to automatically remove every row, in which a value is above 1. This should improve the prediction model by a lot, because all the columns I'm using should only have values below 1 anyway.

I used the following line to do this in the column "Net Value Growth Rate"

data = data[data[' Net Value Growth Rate'].between(0, 1)

This obviously only works for that one column while the others remain untouched. Is there some way to remove every value in the dataset above 1?

Answer 1

尝试一个简单的面具：

data[data<=1]

How do I remove every row in a dataframe which has a value above a certain threshold?

Question

1 answers

solution1
0 2021-10-29 16:17:20

How do I remove every row in a dataframe which has a value above a certain threshold?

Question

1 answers

solution1 0 2021-10-29 16:17:20

solution1
0 2021-10-29 16:17:20