简体   繁体   中英

Why doesn't this Python pandas code work on my dataset?

I am a newbie in data science, and I encountered a problem about pandas in Python. Basically, I want to substitute the value lower than 0 in a column with 0, and I wonder why this does not work:

Image of my dataset: dataset:
数据集

Original:

submit[submit.score<0].score = 0

Fixed:

submit.loc[submit.score<0, 'score'] = 0

I have already solved this problem by using iloc, but it really confuses me. Any explanation would be great.

Your first attempt is equivalent to submit[submit['score'] < 0]['score'] = 0 . Whenever you see multiple [ and ] pairs in your pandas code, it might be a bad sign. In this case, with submit[submit['score'] < 0] you're creating a copy of your dataframe, so you're basically assigning 0 to the score column on that copy , which isn't going to do anything.

By using loc , you eliminate the copy and assign directly to the dataframe.

Using .loc is good, like the sibling answer says.

Even better, sometimes, is to use chaining operations where you create new objects instead of mutating another in-place. This leads to code that is easy to read and follow.

I would suggest the following:

submit = submit.assign(score=submit.score.clip(0, None))

It's still just one line, but it makes a new dataframe with the score column replaced. The .clip() method is used to clamp the values into an interval, in this case so that anything less than 0 will be zero.

This style makes it easy to add more operations in a chain (a style seen other places).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM