[英]Pandas/Pythonic way to groupby a column X, within each group, return value in column Y based on value in column Z
Reproducible example: 可重现的示例:
df = pd.DataFrame([[1, '2015-12-15', 10],
[1, '2015-12-16', 13],
[1, '2015-12-17', 16],
[2, '2015-12-15', 19],
[2, '2015-12-11', 22],
[2, '2015-12-18', 25],
[3, '2015-12-14', 28],
[3, '2015-12-12', 31],
[3, '2015-12-15', 34]])
df.columns = ['X', 'Y', 'Z']
print(df.dtypes)
print()
print(df)
The output of reproducible example and the datatype of each column: 可重现示例的输出和每一列的数据类型:
X int64
Y object
Z int64
dtype: object
X Y Z
0 1 2015-12-15 10
1 1 2015-12-16 13
2 1 2015-12-17 16
3 2 2015-12-15 19
4 2 2015-12-11 22
5 2 2015-12-18 25
6 3 2015-12-14 28
7 3 2015-12-12 31
8 3 2015-12-15 34
Expected Output: 预期产量:
X Y Z
0 1 2015-12-15 10
1 1 2015-12-15 10
2 2 2015-12-11 22
3 2 2015-12-15 19
4 3 2015-12-12 31
5 3 2015-12-15 34
Explanation of what that output is: 该输出是什么的说明:
For every group in column X
after grouping by X
, I want one row with the value in column Z
where value in column Y
for that group is the min(all dates/object in column Y)
and for the same group, another row with the value in column 'Z' where value in column Y
for that group is the some custom date that definitely exists for all groups which will be hardcoded
. 对于每一个组中的列
X
由分组后X
,我想要一个行与列中的值Z
其中在列值Y
该组是min(all dates/object in column Y)
和对于相同的基团,另一行与“ Z”列中的值,其中该组的Y
列中的值是some custom date that definitely exists for all groups which will be hardcoded
的some custom date that definitely exists for all groups which will be hardcoded
。 So every group would have two rows. 因此,每个组将有两行。
In my output, For group 1
, value in column Z
is 10
, because the value in column Z
associated with the minimum of all dates in column Y
for group 1
, 12-15-2015
is 10
. 在我的输出中,对于组
1
, Z
列的值为10
,因为与组1
Y
的所有日期中的最小日期相关联的Z
列的值12-15-2015
为10
。 For the same group 1
, the second row for this group 1
, the value in column Z
for the custom date 12-15-2015
is also 10
. 出于同样的组
1
,对于该组第二行1
,在列中的值Z
为自定义日期12-15-2015
也是10
。 For group 2
, min(all dates/objects in column Y)
is 2015-12-11
, the corresponding value in column Z
for group 2
with value in column Y
, 2015-12-11
is 22
. 对于组
2
, min(all dates/objects in column Y)
为2015-12-11
,对于组2
Z
列中具有Y
列的相应值2015-12-11
为22
。 And the for the custom date 12-15-2015
, it is 19
. 而自定义日期
12-15-2015
年12月12-15-2015
日为19
。
Here is what I'm assuming to be some linear time search/retarded code that I wrote to accomplish this: 我假设这是为完成此操作而编写的一些线性时间搜索/延迟代码:
uniqueXs = list(dict(Counter(df['X'].tolist())).keys()) #Get every unique item in column X is a list.
df_list = [] #Empty list that will have rows of my final DataFrame
for x in uniqueXs: #Iterate through each unique value in column X
idfiltered_dataframe = df.loc[df['X'] == x] #Filter DataFrame based on the current value in column X
#(iterating through list of all values)
min_date = min(idfiltered_dataframe['Y']) #Min of column Y
custom_date = '2015-12-15' #Every group WILL have this custom date.
mindatefiltered_dataframe = idfiltered_dataframe.loc[idfiltered_dataframe['Y'] == min_date] #Within group, filter rows where column Y has minimum date
customdatefiltered_dataframe = idfiltered_dataframe.loc[idfiltered_dataframe['Y'] == custom_date] #Within group, filter rows where column Y has a custom date
for row_1 in mindatefiltered_dataframe.index: #Iterate through mindatefiltered DataFrame and create list of each row value required
row_list = [mindatefiltered_dataframe.at[row_1, 'X'], mindatefiltered_dataframe.at[row_1, 'Y'], mindatefiltered_dataframe.at[row_1, 'Z']]
df_list.append(row_list) #Append to a master list
for row_2 in customdatefiltered_dataframe.index: #Iterate through customdatefiltered DataFrame and create list of each row value required
row_list = [customdatefiltered_dataframe.at[row_2, 'X'], customdatefiltered_dataframe.at[row_2, 'Y'], customdatefiltered_dataframe.at[row_2, 'Z']]
df_list.append(row_list) #Append to a master list
print(pd.DataFrame(df_list)) #Create DataFrame out of the master list
I'm under the impression that there is some slick way, where you just do df.groupby..
and get the expected output and I'm hoping someone could provide me with this code to do that. 我的印象是,有一种
df.groupby..
方法,您只需要执行df.groupby..
并获得预期的输出,我希望有人可以为我提供此代码。
IIUC IIUC
g1=df.groupby('X').Y.value_counts().count(level=1).eq(df.X.nunique()) # get group1 , all date should show in three groups , we using value_counts
df.Y=pd.to_datetime(df.Y) # change to date format in order to sort
g2=df.sort_values('Y').groupby('X').head(1) # get the min date row .
pd.concat([df.loc[df.Y.isin(g1[g1].index)],g2]).sort_index() # combine all together
Out[280]:
X Y Z
0 1 2015-12-15 10
0 1 2015-12-15 10
3 2 2015-12-15 19
4 2 2015-12-11 22
7 3 2015-12-12 31
8 3 2015-12-15 34
Use - 采用 -
date_fill = dt.datetime.strptime('2015-12-15', '%Y-%m-%d')
df['Y'] = pd.to_datetime(df['Y'], format='%Y-%m-%d')
df_g = df.loc[df.groupby(['X'])['Y'].idxmin()]
df2 = df[df['Y']==date_fill]
target_map = pd.Series(df2['Z'].tolist(),index=df2['X']).to_dict()
df_g.index = range(1, 2*len(df_g)+1, 2)
df_g = df_g.reindex(index=range(2*len(df_g)))
df_g['Y'] = df_g['Y'].fillna(date_fill)
df_g = df_g.bfill()
df_g.loc[df_g['Y']==date_fill, 'Z'] = df_g[df_g['Y']==date_fill]['X'].map(target_map)
df_g = df_g.bfill()
print(df_g)
Output 产量
X Y Z
0 1.0 2015-12-15 10.0
1 1.0 2015-12-15 10.0
2 2.0 2015-12-15 19.0
3 2.0 2015-12-11 22.0
4 3.0 2015-12-15 34.0
5 3.0 2015-12-12 31.0
Explanation 说明
date_fill
date_fill
df.groupby(['X'])['Y'].idxmin()
takes the rows by min
of Y
df.groupby(['X'])['Y'].idxmin()
按Y
的min
进行行 target_map
is a dict created to preserve Z
values later target_map
是为以后保留Z
值而创建的字典 df_g
is expanded to have na
values every alternate column df_g
扩展为每隔一列具有na
值 df_g = df_g.bfill()
comes twice in case you enter a date in date_fill
that isn't present in the df
. df_g = df_g.bfill()
出现两次,以防您在date_fill
中输入了df
不存在的日期。 In that case target_map
won't populate and you will end up getting na
values. target_map
,您最终将获得na
值。 I am sure this can be optimized somewhat, but the thought process should help you proceed. 我相信可以在某种程度上进行优化,但是思考过程应该可以帮助您继续前进。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.