简体   繁体   English

在数据框中使用的Lambda函数

[英]Lambda function to use in dataframe

I have the following vector 我有以下向量

3 
5
6
7
4
6
7 
8

And I would like to implement a lambda function that given a vector element i , computes the mean value of i-3 ,i-2 i-1 and ith element. 我想实现一个给定向量元素i的lambda函数,计算出i-3,i-2 i-1和第ith个元素的平均值。 But I do not know how can I access the i-3, i-2, i-1 elements in the lambda function. 但是我不知道如何访问lambda函数中的i-3,i-2,i-1元素。

You can use the rolling() method to access the elements of a Pandas series within a specified window. 您可以使用rolling()方法访问指定窗口中的Pandas系列元素。 Then, you can use a lambda function to calculate the mean for the elements in that window. 然后,您可以使用lambda函数计算该窗口中元素的均值。 In order to include the three elements to the left of the current element, you use a window size of 4 : 为了在当前元素的左侧包括三个元素,您使用4的窗口大小:

In [39]: import pandas as pd

In [40]: S = pd.Series([3, 5, 6, 7, 4, 6, 7, 8])

In [41]: S.rolling(4).apply(lambda x: pd.np.mean(x))
Out[41]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

You'll note that there are missing values for the first three elements. 您会注意到前三个元素缺少值。 This is so because you can only start to form a window of the size 4 from the fourth element onwards. 之所以如此,是因为您只能从第四个元素开始形成大小为4的窗口。 However, if you want to calculate with smaller windows for the first elements, you can use the argument min_periods to specify the smallest valid window size: 但是,如果要为第一个元素使用较小的窗口进行计算,则可以使用参数min_periods指定最小的有效窗口大小:

In [42]: S.rolling(4, min_periods=1).apply(lambda x: pd.np.mean(x))
Out[42]: 
0    3.000000
1    4.000000
2    4.666667
3    5.250000
4    5.500000
5    5.750000
6    6.000000
7    6.250000
dtype: float64

Having said that , you don't need the lambda in the first place – I included it only because you explicitly asked for lambdas. 话虽如此 ,您首先不需要lambda -我之所以将它包括在内,仅是因为您明确要求lambda。 The method rolling() creates a Rolling object that has a built-in mean function that you can use, like so: rolling()方法创建一个Rolling对象,该对象具有可以使用的内置mean函数,如下所示:

In [43]: S.rolling(4).mean()
Out[43]: 
0     NaN
1     NaN
2     NaN
3    5.25
4    5.50
5    5.75
6    6.00
7    6.25
dtype: float64

if you want to do it on a pandas dataframe the easiest way is to use .loc, assuming you know the index position of i. 如果要在pandas数据帧上执行此操作,最简单的方法是使用.loc,假设您知道i的索引位置。

 import pandas as pd

 df = pd.DataFrame([3, 5, 6, 7, 4, 6, 7 ,8])
 setx = lambda x: df.loc[x:x-3:-1].mean()
 # x is the index position of your target value.
 > setx(4) # Without mean() gives values [4, 7, 6, 5]
 >> 5.5

Although if you want to stick with PEP8 standards it is best to define a function and avoid lambda in cases where (see python.org/dev/peps/pep-0008/#id50), assigning functions to an identifier by means of a lambda expression that is advised against in PEP8. 虽然如果您想坚持使用PEP8标准,则最好在以下情况下定义函数并避免使用lambda(请参阅python.org/dev/peps/pep-0008/#id50),通过lambda将函数分配给标识符在PEP8中建议不要使用此表达式。 Thank you @ Schmuddi for the clarification . 谢谢@ Schmuddi的澄清

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM