[英]What does .transform('first') do?
Somebody helped me for a code.有人帮我写了一个代码。 I understood everything in the code except the very last row .transform('first')
I see what it does (I can see it), but I'd like to precisely know what it's doing behind to obtain this result.我理解代码中的所有内容,除了最后一行.transform('first')
我看到它做了什么(我可以看到它),但我想确切地知道它在做什么来获得这个结果。
This is the part of the code I understand :这是我理解的代码部分:
df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
.groupby(df['Date'].dt.year)
.Value
.cumsum()
.sub(df['Value'])
.add(df['YTD'])
)
This is the output of this first part :这是第一部分的输出:
Value Type Date YTD YEP
0 100 Budget 2019-01-01 101.0 974.0
1 50 Budget 2019-02-01 199.0 1022.0
2 20 Budget 2019-03-01 275.0 1078.0
3 123 Budget 2019-04-01 332.0 1012.0
4 56 Budget 2019-05-01 NaN NaN
5 76 Budget 2019-06-01 NaN NaN
6 98 Budget 2019-07-01 NaN NaN
7 126 Budget 2019-08-01 NaN NaN
8 90 Budget 2019-09-01 NaN NaN
9 80 Budget 2019-10-01 NaN NaN
10 67 Budget 2019-11-01 NaN NaN
11 87 Budget 2019-12-01 NaN NaN
12 101 Actual 2019-01-01 101.0 NaN
13 98 Actual 2019-02-01 199.0 NaN
14 76 Actual 2019-03-01 275.0 NaN
15 57 Actual 2019-04-01 332.0 NaN
This is the entire code :这是整个代码:
df['Date'] = pd.to_datetime(df['Date'])
df['YEP'] = ( df[::-1].loc[df['Type'].eq('Budget')]
.groupby(df['Date'].dt.year)
.Value
.cumsum()
.sub(df['Value'])
.add(df['YTD'])
.groupby(df['Date'])
.transform('first') )
I got this after running the entire code :运行整个代码后我得到了这个:
Value Type Date YTD YEP
0 100 Budget 2019-01-01 101.0 974.0
1 50 Budget 2019-02-01 199.0 1022.0
2 20 Budget 2019-03-01 275.0 1078.0
3 123 Budget 2019-04-01 332.0 1012.0
4 56 Budget 2019-05-01 NaN NaN
5 76 Budget 2019-06-01 NaN NaN
6 98 Budget 2019-07-01 NaN NaN
7 126 Budget 2019-08-01 NaN NaN
8 90 Budget 2019-09-01 NaN NaN
9 80 Budget 2019-10-01 NaN NaN
10 67 Budget 2019-11-01 NaN NaN
11 87 Budget 2019-12-01 NaN NaN
12 101 Actual 2019-01-01 101.0 974.0
13 98 Actual 2019-02-01 199.0 1022.0
14 76 Actual 2019-03-01 275.0 1078.0
15 57 Actual 2019-04-01 332.0 1012.0
I know that "transform" is like "apply".我知道“转换”就像“应用”。 But I don't get what it means to apply - or transform
-with this parameter first
.但是我不明白first
此参数应用或transform
意味着什么。 What does first
do here combined with transform
?这里first
与transform
结合做什么?
Thank you谢谢
The parameter in the .transform()
method may be a NumPy function, a string function name or a user defined function. .transform()
方法中的参数可以是 NumPy 函数、字符串函数名称或用户定义的函数。 It means that in the line这意味着在该行
.transform('first')
it's a string function name .它是一个字符串函数名称。 So it represents the function first()
.所以它代表函数first()
。
first()
come?函数first()
哪里?first()
return?函数first()
返回什么? It returns the first non- NaN
value in a series, or NaN
if there is none.它返回第一个非NaN
值在一系列或NaN
,如果是没有的。
.transform()
do?方法.transform()
什么作用?It applies its parameter-function to every column (ie the series) of dataframe to obtain a new (transformed) column.它将其参数函数应用于数据帧的每一列(即系列)以获得新的(转换的)列。 Then it returns a dataframe consisting of such (transformed) columns.然后它返回一个由此类(转换后的)列组成的数据帧。
In the case of series it returns — of course — a transformed series .在 series 的情况下,它返回——当然——一个转换后的series 。
.transform
method must return a series with the same size?这意味着.transform
方法的函数参数必须返回一个大小相同的系列? No, it is only one possibility.不,这只是一种可能。
The other is a scalar — it will be broadcasted (repeated) to make a series with the same size.另一个是标量——它将被广播(重复)以制作具有相同大小的系列。
The used function (the GroupBy method first()
) is a good example of such a function.使用的函数(GroupBy 方法first()
)就是这种函数的一个很好的例子。
.transform('first')
return?那么.transform('first')
返回什么? It returns a series / dataframe with the same shape as the source group chunk, in which all values in every individual column are replaced with the first non- NaN
value in this column, or with NaN
if there is none.它返回一个与源组块具有相同形状的系列/数据帧,其中每个单独列中的所有值都替换为该列中的第一个非NaN
值,如果没有,则替换为NaN
。
The lines线条
.groupby(df['Date'])
.transform('first')
first split your (intermediate) series into groups of individual dates and then — just before recombination — apply the first()
function to every series in every group.首先将您的(中间)系列分成单独的日期组,然后 - 就在重组之前 - 将first()
函数应用于每个组中的每个系列。
It effectively replaces every value in every group with the first non- NaN
value in its series if such a value exists.如果存在这样的值,它会用其系列中的第一个非NaN
值有效地替换每个组中的每个值。
This means that in the resulting series (your new column) will be all values of (intermediate) series replaced with the first non- NaN
value in the same day (if such a value in the same day exists).这意味着在结果系列(您的新列)中,(中间)系列的所有值都将替换为同一天的第一个非NaN
值(如果同一天存在这样的值)。
After grouping by dates ( groupby(df['Date'].dt.year)
), each value is changed to the value of the row where this date first appears.按日期分组后( groupby(df['Date'].dt.year)
),每个值都会更改为该日期第一次出现的行的值。 This changes the last value of the 'Actual'
rows to the original values from 'Budget'
rows.这会将'Actual'
行的最后一个值更改为'Budget'
行中的原始值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.