简体   繁体   中英

How to access previous row value in pandas lambda function or get the index of each row when doing df.apply()

See initial question in the end.

I have a dataframe like so

df = pd.DataFrame({'Persons':[10,20,30], 'Bill':[110,240,365], 'Guests':[12,25,29],'Visitors':[15,23,27]})
df

Persons     Bill    Guests  Visitors
10          110     12      15
20          240     25      23
30          365     29      27

I want a data frame like below


Persons     Bill    Guests  Visitors Charge  VisitorsCharge
10          110     12      15       136     175
20          240     25      23       302.5   277.5
30          365     29      27       352.5   327.5

Here Charge is the interpolated value corresponding to Guests with columns People & Bill as reference.

If we take the first row, we say 10 People will rack-up as Bill of 110 & 20 People will rack-up a Bill of 240. So, how much is 12 Guests create a Charge ?

Formula for this is as below

Row1

import scipy.stats as stats
result = stats.linregress([10,20],[110,240])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*12 + intercept #interpolate the value
interpolatedValue

Row2

import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*25 + intercept #interpolate the value
interpolatedValue

Row3

import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*29 + intercept #interpolate the value
interpolatedValue

For every row except the last row, we have to use the current & the next row values to get our result.

However, when we reach the last row, we will not have a 'next' row. So, we concatenation current row & previous row values.

We do the same to calculate VisitorsCharge as well. But here, we use Vistors column value to multiply with "Slope"

A function would solve the issue. However, with lambda function I do not have access to previous & next rows. With df.apply, I am unable to figure out the index of each row as the function is being applied. How do I do it?

initial question

I have a dataframe like so

A   B   
1   100
2   200
3   300

I want a data frame like below

A   B   C
1   100 '1-2-100-200'   
2   200 '2-3-200-300'
3   300 '2-3-200-300'

NB. solution to initial question. See here for an answer to the new question.

You can use shift and ffill :

a = df['A'].astype(str)
b = df['B'].astype(str)
s = a+'-'+a.shift(-1)+'-'+b+'-'+b.shift(-1)
df['C'] = s.ffill()

Generalization for an arbitrary number of columns:

def cat(s, sep='-'):
    s = s.astype(str)
    return s+sep+s.shift(-1)

df['C'] = df.apply(cat).ffill().agg('-'.join, axis=1)

output:

   A    B            C
0  1  100  1-2-100-200
1  2  200  2-3-200-300
2  3  300  2-3-200-300

I think this is what you want:

import scipy.stats as stats

def compute(i, n=2):
    j = min(i, df.index[len(df)-n])
    idx = df.index[j:j+n]
    result = stats.linregress(df.loc[idx, 'Persons'], df.loc[idx, 'Bill'])
    slope = result.slope
    intercept = result.intercept
    return slope*df.loc[i, 'Guests'] + intercept

df['Charge'] = [compute(i) for i in df.index]
# or
# df['Charge'] = df.index.to_series().apply(compute)

output:

   Persons  Bill  Guests  Charge
0       10   110      12   136.0
1       20   240      25   302.5
2       30   365      29   352.5

Try this:

import scipy.stats as stats

df['next_persons'] = df.Persons.shift(-1)
df['next_bill'] = df.Bill.shift(-1)

def your_interpolation_func(x, y, z): 
    result = stats.linregress(np.array(x), np.array(y))
    return result.slope*z + result.intercept

df['charge'] = df.apply(lambda row: your_interpolation_func(
    [row.Persons, row.next_persons],
    [row.Bill, row.next_bill],
    row.Guests), axis=1)

Output:

df

    Persons Bill    Guests  next_persons    next_bill   charge
0   10  110 12  20.0    240.0   136.0
1   20  240 25  30.0    365.0   302.5
2   30  365 29  NaN NaN NaN

the NaN in the last row is because you don't have any next numbers for the last row. You can apply the function to df.iloc[:-1] to avoid that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM