简体   繁体   中英

Is it possible to use a loc inside a loc in Pandas to replace a value?

Let me first sketch the problem I am trying to solve. I am trying to replace a value '-1' with another value in the same column, based on two other values in the row containing the '-1'. To be more clear, here is an example. In the dataframe underneath, there are two missing values in the 'owner' column. What I want is to replace each '-1' value with a value in the 'owner' column that has the same 'price' value and which is the first one that occurs earlier in 'time' than the '-1' value. So, in this example, the first -1 value is found in line 3. The corresponding 'price' and 'time' is cheap and 2011-01-01 13:30:00 . So now, I would like to replace the -1 with the name of the owner, that has a cheap car, which is the first in time before the one under consideration, so the first time before 2011-01-01 13:30:00 . In this case, that would be the one in line 1, with owner name Jane . This should also automatically be done for any following -1 values (eg for the Bmw).

   brand   price time                 owner
0   Honda  cheap 2008-01-01 13:30:00  Marc
1  Toyota  cheap 2009-01-01 13:30:00  Jane
2    Ford   alot 2010-01-01 13:30:00  Phil
3    Audi  cheap 2011-01-01 13:30:00    -1
4   Volvo  cheap 2012-01-01 13:30:00  Jane
5     Bmw   alot 2013-01-01 13:30:00    -1

The way I wanted to solve this problem, was by first locating the -1, then saving the corresponding price and time and then locating the first corresponding price in time and replace the owner value. I wanted to use the Pandas Loc method for this in the following way (I included the code to make the dataframe as well).

import pandas as pd
from datetime import datetime

cars = {'brand': ['Honda','Toyota','Ford','Audi','Volvo','Bmw'],
        'price': ['cheap','cheap','alot','cheap','cheap','alot'],
        'time': [datetime.strptime('1/1/2008 1:30 PM', '%m/%d/%Y %I:%M %p'),datetime.strptime('1/1/2009 1:30 PM', '%m/%d/%Y %I:%M %p'),datetime.strptime('1/1/2010 1:30 PM', '%m/%d/%Y %I:%M %p'),datetime.strptime('1/1/2011 1:30 PM', '%m/%d/%Y %I:%M %p'),
                 datetime.strptime('1/1/2012 1:30 PM', '%m/%d/%Y %I:%M %p'),datetime.strptime('1/1/2013 1:30 PM', '%m/%d/%Y %I:%M %p')],
        'owner': ['Marc', 'Jane','Phil','-1','Jane','-1']}

df = pd.DataFrame(cars, columns = ['brand', 'price','time','owner'])

P_T = df.loc[df.owner == '-1',['price','time']

df.loc[df.owner == '-1', 'owner'] = df.loc[(df.price == P_T.price)&(df.time < P_T.time), 'owner']

As you can see in the final line, this is essentially a loc within a loc, whereas the conditions on the right hand side of the equation are both based on the P_T loc. However, here comes the problem, as I keep getting this error: ValueError: Can only compare identically-labeled Series objects I think I am doing something wrong and maybe doing things not as efficient as possible... So I would really appreciate some help with this matter.

I think you've over complicated this - what essentially you need to do is fill -1 values with the last corresenponding value grouped by the price column? if so a forward fill will do ffill

import numpy as np
s = df.replace('-1',np.nan).sort_values('time').groupby(['price'])['owner'].ffill()

df['owner'] = df.index.map(s)



print(df)


    brand  price                time owner
0   Honda  cheap 2008-01-01 13:30:00  Marc
1  Toyota  cheap 2009-01-01 13:30:00  Jane
2    Ford   alot 2010-01-01 13:30:00  Phil
3    Audi  cheap 2011-01-01 13:30:00  Jane
4   Volvo  cheap 2012-01-01 13:30:00  Jane
5     Bmw   alot 2013-01-01 13:30:00  Phil

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM