[英]Is it possible to apply groupby inside lambda function?
I want to apply pandas groupby method to three columns(Input File Name, Page Number, Top) under a condition column name "Group" is equal to "Group 1" value and keep rest of the value as it is.我想在条件列名称“组”等于“组 1”值下将 pandas groupby 方法应用于三列(输入文件名、页码、顶部)并保持 rest 的值不变。
DATASET:数据集: (EDITED) (已编辑)
{'Input File Name': {82: '109 S Ankeny Blvd_Flyer',
83: '109 S Ankeny Blvd_Flyer',
84: '109 S Ankeny Blvd_Flyer',
85: '109 S Ankeny Blvd_Flyer',
86: '109 S Ankeny Blvd_Flyer',
87: '109 S Ankeny Blvd_Flyer',
88: '109 S Ankeny Blvd_Flyer',
89: '109 S Ankeny Blvd_Flyer',
90: '109 S Ankeny Blvd_Flyer',
91: '109 S Ankeny Blvd_Flyer'},
'Page Number': {82: 2,
83: 2,
84: 2,
85: 2,
86: 2,
87: 2,
88: 2,
89: 2,
90: 2,
91: 2},
'Content': {82: '5 Mile',
83: 'Population',
84: '12,898',
85: '59,989',
86: '67,553',
87: 'Dustin Whitehead, CCIM',
88: 'Vice President ',
89: 'AVG. HH Income',
90: '$84,258',
91: '$98,879'},
'Font Class': {82: 0,
83: 1,
84: 1,
85: 1,
86: 1,
87: 0,
88: 1,
89: 1,
90: 1,
91: 1},
'Font Size': {82: 11,
83: 11,
84: 11,
85: 11,
86: 11,
87: 11,
88: 11,
89: 11,
90: 11,
91: 11},
'Top': {82: 890,
83: 914,
84: 914,
85: 914,
86: 914,
87: 918,
88: 935,
89: 940,
90: 940,
91: 940},
'Left': {82: 459,
83: 67,
84: 244,
85: 352,
86: 460,
87: 679,
88: 679,
89: 67,
90: 244,
91: 352},
'Width': {82: 41,
83: 72,
84: 42,
85: 46,
86: 43,
87: 171,
88: 102,
89: 111,
90: 54,
91: 54},
'Height': {82: 18,
83: 17,
84: 17,
85: 17,
86: 17,
87: 18,
88: 17,
89: 17,
90: 17,
91: 17},
'Group_ID': {82: 6777192940,
83: 7210324356,
84: 1579068838,
85: 6589580993,
86: 370057979,
87: 381660731,
88: 8098819066,
89: 7210324356,
90: 1579068838,
91: 6589580993},
'Group': {82: nan,
83: 'Group 1',
84: 'Group 1',
85: 'Group 1',
86: 'Group 1',
87: 'Group 1',
88: 'Group 1',
89: 'Group 1',
90: 'Group 1',
91: 'Group 1'},
}
Below is my code that I've tried以下是我尝试过的代码
df is my entire dataset df 是我的整个数据集
df[df['Group'].apply(lambda x:x=='Group 1')].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))
If I use the above code, this apply to Group 1 values and ignores the remaining.如果我使用上面的代码,这适用于第 1 组值并忽略其余值。 So I've tried another way:所以我尝试了另一种方法:
df.apply(lambda x: x.groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x)) if x['Group']=='Group 1' else x)
As a result I get KeyError: ('Group', 'occurred at index Input File Name')结果我得到KeyError: ('Group', 'occured at index Input File Name')
Is this method possible?这种方法可行吗?
Desired Result:期望的结果: (EDITED) (已编辑)
{'Input File Name': {0: '109 S Ankeny Blvd_Flyer',
1: '109 S Ankeny Blvd_Flyer',
2: '109 S Ankeny Blvd_Flyer',
3: '109 S Ankeny Blvd_Flyer',
4: '109 S Ankeny Blvd_Flyer'},
'Page Number': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
'Top': {0: 890, 1: 914, 2: 918, 3: 935, 4: 940},
'Content': {0: '5 Mile',
1: ['Population', '12,898', '59,989', '67,553'],
2: ['Dustin Whitehead, CCIM'],
3: ['Vice President'],
4: ['AVG. HH Income', '$84,258', '$98,879']},
'Font Class': {0: 0, 1: [1, 1, 1, 1], 2: [0], 3: [1], 4: [1, 1, 1]},
'Font Size': {0: 11,
1: [11, 11, 11, 11],
2: [11],
3: [11],
4: [11, 11, 11]},
'Left': {0: 459,
1: [67, 244, 352, 460],
2: [679],
3: [679],
4: [67, 244, 352]},
'Width': {0: 41, 1: [72, 42, 46, 43], 2: [171], 3: [102], 4: [111, 54, 54]},
'Height': {0: 18, 1: [17, 17, 17, 17], 2: [18], 3: [17], 4: [17, 17, 17]},
'Group_ID': {0: 6777192940,
1: [7210324356, 1579068838, 6589580993, 370057979],
2: [381660731],
3: [8098819066],
4: [7210324356, 1579068838, 6589580993]},
'Group': {0: nan,
1: ['Group 1', 'Group 1', 'Group 1', 'Group 1'],
2: ['Group 1'],
3: ['Group 1'],
4: ['Group 1', 'Group 1', 'Group 1']},
}
How about do it in two steps:如何分两步完成:
== Group1
当组== Group1
时过滤Something like:就像是:
df_group1 = df[df['Group'] == 'Group 1'].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))
df_rest = df[df['Group'] != 'Group 1']
But using pd.concat
to combine those would need more information.但是使用pd.concat
来组合这些需要更多信息。 eg: the new agg
will be a new list? eg:新的agg
将是一个新的列表? What happened to other group, None
?其他组怎么了, None
?
Without exact desired output, I cannot help you with this.如果没有确切想要的 output,我无法为您提供帮助。
EDIT : with the input and output you have provided, you can just follow above steps then concat
the two things.编辑:使用您提供的输入和concat
,您可以按照上述步骤操作,然后将这两件事连接起来。
Full code:完整代码:
df_group_1 = df[df['Group'] == 'Group 1'].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))
df_rest = df[df['Group'] != 'Group 1']
df_out = pd.concat([df_group1, df_rest], axis=0, sort=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.