[英]Splitting data frame into smaller data frames based on unique column values
this is my data frame:这是我的数据框:
Quantity Code Value
0 1757 08951201 717.0
1 1100 08A85800 0.0
2 2500 08A85800 0.0
3 323 08951201 0.0
4 800 08A85800 0.0
and i what to split this into smaller data frames created based on Code column.我如何将其拆分为基于代码列创建的较小数据框。 (Eg this one should split into df1 with all 08951201 codes and df2 with 08A85800)
(例如,这个应该分成带有所有 08951201 代码的 df1 和带有 08A85800 的 df2)
Edit: And I'd love to have a way to merge them back into original dataframe in the same order after some value calculations im gonna perform.编辑:我很想有一种方法将它们合并回原始 dataframe 在我将执行一些价值计算之后以相同的顺序。
Use groupby
and apply your custom function to process your sub dataframe:使用
groupby
并应用您的自定义 function 来处理您的子 dataframe:
groups = df.groupby('Code')
print(list(groups))
# Output:
[('08951201', Quantity Code Value
0 1757 08951201 717.0
3 323 08951201 0.0),
('08A85800', Quantity Code Value
1 1100 08A85800 0.0
2 2500 08A85800 0.0
4 800 08A85800 0.0)]
Now suppose you want to sum
by Value
:现在假设您想按
Value
sum
:
>>> df.groupby('Code')['Value'].sum()
Code
08951201 717.0
08A85800 0.0
Name: Value, dtype: float64
As suggested you could use groupby()
on your dataframe to segregate by one column name values:正如建议的那样,您可以在 dataframe 上使用
groupby()
以按一列名称值分隔:
import pandas as pd
cols = ['Quantity', 'Code', 'Value']
data = [[1757, '08951201', 717.0],
[1100, '08A85800', 0.0],
[2500, '08A85800', 0.0],
[323, '08951201', 0.0],
[800, '08A85800', 0.0]]
df = pd.DataFrame(data, columns=cols)
groups =df.groupby(['Code'])
Then you can recover indices by groups.indices
, this will return a dict with 'Code' values as keys, and index as values.然后您可以通过
groups.indices
恢复索引,这将返回一个以“代码”值作为键,索引作为值的字典。 For last if you want to get every sub-dataframe you can call group_list = list(groups)
.最后,如果您想获取每个子数据帧,您可以调用
group_list = list(groups)
。 I suggest to do the work in 2 steps (first group by, then call list), because this way you can call other methods over the groupDataframe ( group
)我建议分两步完成工作(首先分组,然后调用列表),因为这样您可以通过 groupDataframe (
group
)调用其他方法
EDIT编辑
Then if you want a particular dataframe you could call然后,如果你想要一个特定的 dataframe 你可以打电话
df_i = group_list[i][1]
group_list[i]
is the i-th element of sub-dataframe, but it's a tupple containing (group_val,group_df)
. group_list[i]
是子数据帧的第 i 个元素,但它是一个包含(group_val,group_df)
的元组。 where group_val
is the value associated to this new dataframe ( '08951201'
or '08A85800'
) and group_df
is the new dataframe.其中
group_val
是与这个新的 dataframe( '08951201'
或'08A85800'
)关联的值, group_df
是新的 dataframe。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.