简体   繁体   English

如何在执行 df.apply() 时访问 pandas lambda 函数中的前一行值或获取每行的索引

[英]How to access previous row value in pandas lambda function or get the index of each row when doing df.apply()

See initial question in the end.最后看最初的问题。

I have a dataframe like so我有一个像这样的数据框

df = pd.DataFrame({'Persons':[10,20,30], 'Bill':[110,240,365], 'Guests':[12,25,29],'Visitors':[15,23,27]})
df

Persons     Bill    Guests  Visitors
10          110     12      15
20          240     25      23
30          365     29      27

I want a data frame like below我想要一个像下面这样的数据框


Persons     Bill    Guests  Visitors Charge  VisitorsCharge
10          110     12      15       136     175
20          240     25      23       302.5   277.5
30          365     29      27       352.5   327.5

Here Charge is the interpolated value corresponding to Guests with columns People & Bill as reference.这里的Charge是对应于以People & Bill列作为参考的Guests的插值。

If we take the first row, we say 10 People will rack-up as Bill of 110 & 20 People will rack-up a Bill of 240. So, how much is 12 Guests create a Charge ?如果我们占据第一排,我们说 10 People将累积为 110 的Bill ,而 20 People将累积为 240 的Bill 。那么,12 位Guests产生的Charge是多少?

Formula for this is as below公式如下

Row1第 1 行

import scipy.stats as stats
result = stats.linregress([10,20],[110,240])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*12 + intercept #interpolate the value
interpolatedValue

Row2第 2 行

import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*25 + intercept #interpolate the value
interpolatedValue

Row3第 3 行

import scipy.stats as stats
result = stats.linregress([20,30],[240,365])
slope = result.slope #extract the slope of the interpolation curve
intercept = result.intercept #extract the intercept of the interpolation curve
interpolatedValue  = slope*29 + intercept #interpolate the value
interpolatedValue

For every row except the last row, we have to use the current & the next row values to get our result.对于除最后一行之外的每一行,我们必须使用当前和下一行的值来获得我们的结果。

However, when we reach the last row, we will not have a 'next' row.但是,当我们到达最后一行时,我们将没有“下一个”行。 So, we concatenation current row & previous row values.因此,我们连接当前行和前一行值。

We do the same to calculate VisitorsCharge as well.我们也用同样的方法来计算VisitorsCharge But here, we use Vistors column value to multiply with "Slope"但在这里,我们使用Vistors列值与“斜率”相乘

A function would solve the issue.一个函数可以解决这个问题。 However, with lambda function I do not have access to previous & next rows.但是,使用 lambda 函数,我无法访问上一行和下一行。 With df.apply, I am unable to figure out the index of each row as the function is being applied.使用 df.apply,我无法在应用函数时找出每一行的索引。 How do I do it?我该怎么做?

initial question最初的问题

I have a dataframe like so我有一个像这样的数据框

A   B   
1   100
2   200
3   300

I want a data frame like below我想要一个像下面这样的数据框

A   B   C
1   100 '1-2-100-200'   
2   200 '2-3-200-300'
3   300 '2-3-200-300'

NB.注意。 solution to initial question.初始问题的解决方案。 See here for an answer to the new question.有关新问题的答案,请参见此处

You can use shift and ffill :您可以使用shiftffill

a = df['A'].astype(str)
b = df['B'].astype(str)
s = a+'-'+a.shift(-1)+'-'+b+'-'+b.shift(-1)
df['C'] = s.ffill()

Generalization for an arbitrary number of columns:任意列数的泛化:

def cat(s, sep='-'):
    s = s.astype(str)
    return s+sep+s.shift(-1)

df['C'] = df.apply(cat).ffill().agg('-'.join, axis=1)

output:输出:

   A    B            C
0  1  100  1-2-100-200
1  2  200  2-3-200-300
2  3  300  2-3-200-300

I think this is what you want:我认为这就是你想要的:

import scipy.stats as stats

def compute(i, n=2):
    j = min(i, df.index[len(df)-n])
    idx = df.index[j:j+n]
    result = stats.linregress(df.loc[idx, 'Persons'], df.loc[idx, 'Bill'])
    slope = result.slope
    intercept = result.intercept
    return slope*df.loc[i, 'Guests'] + intercept

df['Charge'] = [compute(i) for i in df.index]
# or
# df['Charge'] = df.index.to_series().apply(compute)

output:输出:

   Persons  Bill  Guests  Charge
0       10   110      12   136.0
1       20   240      25   302.5
2       30   365      29   352.5

Try this:尝试这个:

import scipy.stats as stats

df['next_persons'] = df.Persons.shift(-1)
df['next_bill'] = df.Bill.shift(-1)

def your_interpolation_func(x, y, z): 
    result = stats.linregress(np.array(x), np.array(y))
    return result.slope*z + result.intercept

df['charge'] = df.apply(lambda row: your_interpolation_func(
    [row.Persons, row.next_persons],
    [row.Bill, row.next_bill],
    row.Guests), axis=1)

Output:输出:

df

    Persons Bill    Guests  next_persons    next_bill   charge
0   10  110 12  20.0    240.0   136.0
1   20  240 25  30.0    365.0   302.5
2   30  365 29  NaN NaN NaN

the NaN in the last row is because you don't have any next numbers for the last row.最后一行中的NaN是因为最后一行没有任何下一个数字。 You can apply the function to df.iloc[:-1] to avoid that.您可以将该函数应用于df.iloc[:-1]以避免这种情况。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用df.apply()将具有参数的函数应用于每一行 - Apply a function with arguments to each row using df.apply() Function 使用 df.apply 或 Pandas 中的类似内容修改行值 - Function to modify row values using df.apply or similar in Pandas 函数适用于数据帧的每一行,但不使用df.apply - Function works on each row of data frame, but not using df.apply 将 lambda function 转换为常规 function PYTHON df["domain_count"] = df.apply(lambda row: df['domain'].value_counts()[row['domain']], axis = 1) - convert lambda function to regular function PYTHON df["domain_count"] = df.apply(lambda row : df['domain'].value_counts()[row['domain']], axis = 1) 'NoneType' object 不可订阅 df.apply(lambda row: (row)[0], axis=1) - 'NoneType' object is not subscriptable df.apply(lambda row: (row)[0], axis=1) Python:df['Col'].apply(lambda row: len(row)) 和 df.apply(lambda row: len(row['Col']), axis=1) 有什么区别? - Python: What is the difference between df[‘Col’].apply(lambda row: len(row)) and df.apply(lambda row: len(row[‘Col’]), axis=1)? Pandas df 获取上一行值 - Pandas df get previous row value df.apply() 但跳过第一行 - df.apply() but skip the first row Pandas df.apply() - Pandas df.apply() 如何使用python apply/lambda/shift函数根据2列的值获取该特定列的前一行值? - How to use python apply/lambda/shift function to get the previous row value of that particular column based on the value of 2 columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM