简体   繁体   English

如何按列分组并将组中的所有值复制到pandas中的一行?

[英]How to group by column and copy all values of a group to one row in pandas?

This is a sample of my dataset: 这是我的数据集的示例:

Consumer_num | billed_units  
29           | 984
29           | 1244
29           | 2323
29           | 1232
29           | 1150
30           | 3222
30           | 1444
30           | 2124

I want to group by consumer_num and then add all values (billed_units) of each group into new columns. 我想按consumer_num分组,然后将每个组的所有值(billed_units)添加到新列中。 So my required output: 所以我需要的输出:

Consumer_num | month 1 | month 2 | month 3 | month 4  | month 5  
29           | 984     | 1244     | 2323    | 1232     | 1150 
30           | 3222    | 1444     | 2124    | NaN      | NaN

This is what I've done so far: 这是我到目前为止所做的:

group = df.groupby('consumer_num')['billed_units'].unique()
group[group.apply(lambda x: len(x)>1)]
df = group.to_frame()
print df

Output: 输出:

Consumer_num | billed_units  
29           | [984,1244,2323,1232,1150]
30           | [3222,1444,2124]

I don't know whether my approach is correct. 我不知道我的方法是否正确。 If it's right, then I would like to know how I can separate billed_units of each consumer and then add to new columns as I've shown in my required output. 如果它是正确的,那么我想知道如何将每个消费者的billed_units分开,然后添加到我在所需输出中显示的新列。 Or is there a better method to achieve my required output? 或者有更好的方法来实现我所需的输出?

solution

c = 'Consumer_num'
m = 'month {}'.format
df.set_index(
    [c, df.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).reset_index()

   Consumer_num  month 1  month 2  month 3  month 4  month 5
0            29    984.0   1244.0   2323.0   1232.0   1150.0
1            30   3222.0   1444.0   2124.0      NaN      NaN

how it works 这个怎么运作

  • put 'Consumer_num' into a variable c for convenience 为方便起见,将'Consumer_num'放入变量c
  • put mapper function into a variable m for convenience 为了方便起见,将mapper函数放入变量m
  • setting index with two columns to make a pd.MultiIndex 使用两列设置​​索引以生成pd.MultiIndex
    • I use groupby and cumcount to create a level to unstack with 我使用groupbycumcount创建水平unstack
    • then I unstack 然后我unstack
  • finally use the mapper function to rename the columns 最后使用mapper函数重命名列

response to comments 回应评论

One approach for limiting the number of months is to use iloc . 限制月数的一种方法是使用iloc The following limits us to 3 months. 以下限制我们为3个月。 You can adjust to take first 5. The nans should take care of themselves. 你可以调整为第5名.Nans应该照顾好自己。

c = 'Consumer_num'
m = 'month {}'.format
df.set_index(
    [c, df.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).iloc[:, :3].reset_index()
#                                         ^..........^

   Consumer_num  month 1  month 2  month 3
0            29    984.0   1244.0   2323.0
1            30   3222.0   1444.0   2124.0

Or you could pre-process 或者你可以预先处理

c = 'Consumer_num'
m = 'month {}'.format
d1 = df.groupby(c).head(3)  # pre-process and take just first 3
d1.set_index(
    [c, d1.groupby(c).cumcount() + 1]
).billed_units.unstack().rename(columns=m).reset_index()

You could use pivot like 你可以像使用pivot一样

In [70]: dfm = df.assign(m=df.groupby('Consumer_num').cumcount().add(1))

In [71]: dfm.pivot('Consumer_num', 'm', 'billed_units').add_prefix('month ')
Out[71]:
m             month 1  month 2  month 3  month 4  month 5
Consumer_num
29              984.0   1244.0   2323.0   1232.0   1150.0
30             3222.0   1444.0   2124.0      NaN      NaN

Details 细节

In [75]: df
Out[75]:
   Consumer_num  billed_units
0            29           984
1            29          1244
2            29          2323
3            29          1232
4            29          1150
5            30          3222
6            30          1444
7            30          2124

In [76]: dfm
Out[76]:
   Consumer_num  billed_units  m
0            29           984  1
1            29          1244  2
2            29          2323  3
3            29          1232  4
4            29          1150  5
5            30          3222  1
6            30          1444  2
7            30          2124  3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何按一列分组并显示每组所有其他列的唯一值计数? - Pandas: How to group by one column and show count for unique values for all other columns per group? Pandas按组中所有值的总和与另一列以逗号分隔 - Pandas Group by sum of all the values of the group and another column as comma separated pandas:如果组的最后一行具有特定的列值,如何删除组的所有行 - pandas: how to drop all rows of a group if the last row of the group has certain column value Pandas 按列分组,除非组中的值之一是某个值 - Pandas group by column unless one of the values in the group is a certain value 如果组内的一行满足 pandas 中的特定条件,如何 select 组的所有行 - How to select all rows of group if one row within group meets certain condition in pandas 如何将行值与熊猫中的组值进行比较 - How to compare row values with group value in pandas 保留 Pandas 组中第一行的所有列值以及后续行中的任何更新? - Keep all the column values from first row and any updates in subsequent rows in a Pandas group? 您可以使用 Pandas 使用 Python 将多行按列值分组为一行吗? - Can you group multiple rows all into one row by column value with Python using pandas? 如何在熊猫中交换值的两列分组? - How to group by two column with swapped values in pandas? 如何将一组列标题与其在Pandas中的值进行交换 - How to swap a group of column headings with their values in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM