简体   繁体   中英

Python Pandas indexing

Sorry if this is a simple question, I've tried to look for a solution but can't find anything.

My code goes like this:

  • given zip1, create an index to select observations (other zipcodes) where some calculation has not been done yet (666)

     I = (df['zip1'] == zip1) & (df['Distances'] == 666) 
  • perform some calculation

     distances = calc(zip1,df['zip2'][I]) 

So far so good, I've checked the distances variable, correct values, correct sized array.

  • put the distance variable in the right place

     df['Distances'][I] = distances 

but this last part updates all the df['Distances'] variables to nonsense values FOR ALL observations with df['zip1']=zip1 instead of the ones selected by I .

I've checked the boolean array I before the df['Distances'][I] = distances command and it looks fine. Any ideas would be greatly appreciated.

What you are attempting is called chained assignment and does not work the way you think as it returns a copy rather than a view hence the error you see.

There is more information about it here and related issues , this and this .

So you should either use .loc or .ix like so:

df.loc[I,'Distances']=distances

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM