Suppose I have a dataset like below :
When I try to overwrite a specific column (Series object), I get the error with the following code :
mask = bond["Actor"] == "Sean Connery"
bond[mask]["Actor"] = "Sir Sean Connery"
But the moment I move one level down and instead edit all the columns of those rows (complete DataFrame), I succeed
mask = bond["Actor"] == "Sean Connery"
bond[mask] = "Sir Sean Connery"
Why is that so? In the first case, I thought that its not logical to edit a copy and hence the error. But the same should be applicable in the latter case also, as the second example should also return a copy of the original DataFrame.
There is problem you need loc
for avoid chained indexing :
bond = pd.DataFrame({'Actor':list('abcaef'),
'A':list('efghij'),
'B':list('aaabbb')})
print (bond)
A Actor B
0 e a a
1 f b a
2 g c a
3 h a b
4 i e b
5 j f b
mask = bond["Actor"] == "a"
bond.loc[mask] = "AAA"
#for select all columns :, for columns can be omitted
#bond.loc[mask,:] = "AAA"
print (bond)
A Actor B
0 AAA AAA AAA
1 f b a
2 g c a
3 AAA AAA AAA
4 i e b
5 j f b
#one column Actor
bond.loc[mask, "Actor"] = "AAA"
print (bond)
A Actor B
0 e AAA a
1 f b a
2 g c a
3 h AAA b
4 i e b
5 j f b
Consider the following single column DataFrame:
df = pd.DataFrame({'Actor': ['Sean Connery', 'Sean Connery',
'Sean Something', 'Sean Something Else']})
df
Out:
Actor
0 Sean Connery
1 Sean Connery
2 Sean Something
3 Sean Something Else
And this is the mask that you want to use for slicing:
mask = df['Actor'] == 'Sean Connery'
Now, if I use df[mask]['Actor'] = 'Sir Sean Connery'
, this will be executed:
df.__getitem__(mask).__setitem__('Actor', 'Sir Sean Connery')
__main__:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
And for this case it will not modify the original DataFrame:
df
Out:
Actor
0 Sean Connery
1 Sean Connery
2 Sean Something
3 Sean Something Else
Id did modify a DataFrame though - which was returned by the __getitem__
method but since it was not assigned to anything, it is lost.
Instead, in your second example ( df[mask] = 'Sir Sean Connery'
) the code executed is:
df.__setitem__(mask, 'Sir Sean Connery')
Because of the mask you probably think it uses __getitem__
too but it does not. It directly uses __setitem__
and passes the mask to that DataFrame. And pandas ensures us that with __setitem__
we can be sure that it will operate on a view. For the case of __getitem__
it says it can be o copy or it can be a view - hard to know.
Now you'll see that the original df is modified:
df
Out:
Actor
0 Sir Sean Connery
1 Sir Sean Connery
2 Sean Something
3 Sean Something Else
There is one catch though. It worked because we only had one column. If we had another column, say 'Year', it would set the corresponding Year values to 'Sir Sean Connery' too. In order to avoid that, we use .loc
as jezrael pointed out. It also calls the __setitem__
method and allows specifying which columns will change.
df = pd.DataFrame({'Actor': ['Sean Connery', 'Sean Connery',
'Sean Something', 'Sean Something Else'],
'Year': [1990, 1990, 1990, 1990]})
df.loc.__setitem__((mask, 'Actor'), 'Sir Sean Connery')
df
Out:
Actor Year
0 Sir Sean Connery 1990
1 Sir Sean Connery 1990
2 Sean Something 1990
3 Sean Something Else 1990
As a result, best practice to set based on a mask and column name(s) is to use .loc
:
df.loc[mask, 'Actor'] = 'Sir Sean Connery'
This way you don't have to worry if you are operating on a copy.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.