简体   繁体   English

是否可以在 lambda function 内应用 groupby?

[英]Is it possible to apply groupby inside lambda function?

I want to apply pandas groupby method to three columns(Input File Name, Page Number, Top) under a condition column name "Group" is equal to "Group 1" value and keep rest of the value as it is.我想在条件列名称“组”等于“组 1”值下将 pandas groupby 方法应用于三列(输入文件名、页码、顶部)并保持 rest 的值不变。

DATASET:数据集: (EDITED) (已编辑)

{'Input File Name': {82: '109 S Ankeny Blvd_Flyer',
  83: '109 S Ankeny Blvd_Flyer',
  84: '109 S Ankeny Blvd_Flyer',
  85: '109 S Ankeny Blvd_Flyer',
  86: '109 S Ankeny Blvd_Flyer',
  87: '109 S Ankeny Blvd_Flyer',
  88: '109 S Ankeny Blvd_Flyer',
  89: '109 S Ankeny Blvd_Flyer',
  90: '109 S Ankeny Blvd_Flyer',
  91: '109 S Ankeny Blvd_Flyer'},
 'Page Number': {82: 2,
  83: 2,
  84: 2,
  85: 2,
  86: 2,
  87: 2,
  88: 2,
  89: 2,
  90: 2,
  91: 2},
 'Content': {82: '5 Mile',
  83: 'Population',
  84: '12,898',
  85: '59,989',
  86: '67,553',
  87: 'Dustin Whitehead, CCIM',
  88: 'Vice President ',
  89: 'AVG. HH Income',
  90: '$84,258',
  91: '$98,879'},
 'Font Class': {82: 0,
  83: 1,
  84: 1,
  85: 1,
  86: 1,
  87: 0,
  88: 1,
  89: 1,
  90: 1,
  91: 1},
 'Font Size': {82: 11,
  83: 11,
  84: 11,
  85: 11,
  86: 11,
  87: 11,
  88: 11,
  89: 11,
  90: 11,
  91: 11},
 'Top': {82: 890,
  83: 914,
  84: 914,
  85: 914,
  86: 914,
  87: 918,
  88: 935,
  89: 940,
  90: 940,
  91: 940},
 'Left': {82: 459,
  83: 67,
  84: 244,
  85: 352,
  86: 460,
  87: 679,
  88: 679,
  89: 67,
  90: 244,
  91: 352},
 'Width': {82: 41,
  83: 72,
  84: 42,
  85: 46,
  86: 43,
  87: 171,
  88: 102,
  89: 111,
  90: 54,
  91: 54},
 'Height': {82: 18,
  83: 17,
  84: 17,
  85: 17,
  86: 17,
  87: 18,
  88: 17,
  89: 17,
  90: 17,
  91: 17},
 'Group_ID': {82: 6777192940,
  83: 7210324356,
  84: 1579068838,
  85: 6589580993,
  86: 370057979,
  87: 381660731,
  88: 8098819066,
  89: 7210324356,
  90: 1579068838,
  91: 6589580993},
 'Group': {82: nan,
  83: 'Group 1',
  84: 'Group 1',
  85: 'Group 1',
  86: 'Group 1',
  87: 'Group 1',
  88: 'Group 1',
  89: 'Group 1',
  90: 'Group 1',
  91: 'Group 1'},
 }

Below is my code that I've tried以下是我尝试过的代码

df is my entire dataset df 是我的整个数据集

df[df['Group'].apply(lambda x:x=='Group 1')].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))

If I use the above code, this apply to Group 1 values and ignores the remaining.如果我使用上面的代码,这适用于第 1 组值并忽略其余值。 So I've tried another way:所以我尝试了另一种方法:

df.apply(lambda x: x.groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x)) if x['Group']=='Group 1' else x)

As a result I get KeyError: ('Group', 'occurred at index Input File Name')结果我得到KeyError: ('Group', 'occured at index Input File Name')

Is this method possible?这种方法可行吗?

Desired Result:期望的结果: (EDITED) (已编辑)

{'Input File Name': {0: '109 S Ankeny Blvd_Flyer',
  1: '109 S Ankeny Blvd_Flyer',
  2: '109 S Ankeny Blvd_Flyer',
  3: '109 S Ankeny Blvd_Flyer',
  4: '109 S Ankeny Blvd_Flyer'},
 'Page Number': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2},
 'Top': {0: 890, 1: 914, 2: 918, 3: 935, 4: 940},
 'Content': {0: '5 Mile',
  1: ['Population', '12,898', '59,989', '67,553'],
  2: ['Dustin Whitehead, CCIM'],
  3: ['Vice President'],
  4: ['AVG. HH Income', '$84,258', '$98,879']},
 'Font Class': {0: 0, 1: [1, 1, 1, 1], 2: [0], 3: [1], 4: [1, 1, 1]},
 'Font Size': {0: 11,
  1: [11, 11, 11, 11],
  2: [11],
  3: [11],
  4: [11, 11, 11]},
 'Left': {0: 459,
  1: [67, 244, 352, 460],
  2: [679],
  3: [679],
  4: [67, 244, 352]},
 'Width': {0: 41, 1: [72, 42, 46, 43], 2: [171], 3: [102], 4: [111, 54, 54]},
 'Height': {0: 18, 1: [17, 17, 17, 17], 2: [18], 3: [17], 4: [17, 17, 17]},
 'Group_ID': {0: 6777192940,
  1: [7210324356, 1579068838, 6589580993, 370057979],
  2: [381660731],
  3: [8098819066],
  4: [7210324356, 1579068838, 6589580993]},
 'Group': {0: nan,
  1: ['Group 1', 'Group 1', 'Group 1', 'Group 1'],
  2: ['Group 1'],
  3: ['Group 1'],
  4: ['Group 1', 'Group 1', 'Group 1']},
 }

How about do it in two steps:如何分两步完成:

  1. Filter when group == Group1当组== Group1时过滤
  2. Concat with the rest康卡特与 rest

Something like:就像是:

df_group1 = df[df['Group'] == 'Group 1'].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))
df_rest = df[df['Group'] != 'Group 1']

But using pd.concat to combine those would need more information.但是使用pd.concat来组合这些需要更多信息。 eg: the new agg will be a new list? eg:新的agg将是一个新的列表? What happened to other group, None ?其他组怎么了, None

Without exact desired output, I cannot help you with this.如果没有确切想要的 output,我无法为您提供帮助。


EDIT : with the input and output you have provided, you can just follow above steps then concat the two things.编辑:使用您提供的输入和concat ,您可以按照上述步骤操作,然后将这两件事连接起来。

Full code:完整代码:

df_group_1 = df[df['Group'] == 'Group 1'].groupby(['Input File Name','Page Number','Top'], as_index=False).agg(lambda x: list(x))
df_rest = df[df['Group'] != 'Group 1']
df_out = pd.concat([df_group1, df_rest], axis=0, sort=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM