简体   繁体   中英

Splitting data frame into smaller data frames based on unique column values

this is my data frame:

    Quantity     Code         Value       
0       1757     08951201     717.0
1       1100     08A85800       0.0
2       2500     08A85800       0.0
3        323     08951201       0.0
4        800     08A85800       0.0

and i what to split this into smaller data frames created based on Code column. (Eg this one should split into df1 with all 08951201 codes and df2 with 08A85800)

Edit: And I'd love to have a way to merge them back into original dataframe in the same order after some value calculations im gonna perform.

Use groupby and apply your custom function to process your sub dataframe:

groups = df.groupby('Code')
print(list(groups))

# Output:
[('08951201',    Quantity      Code  Value
0      1757  08951201  717.0
3       323  08951201    0.0),

('08A85800',    Quantity      Code  Value
1      1100  08A85800    0.0
2      2500  08A85800    0.0
4       800  08A85800    0.0)]

Now suppose you want to sum by Value :

>>> df.groupby('Code')['Value'].sum()
Code
08951201    717.0
08A85800      0.0
Name: Value, dtype: float64

As suggested you could use groupby() on your dataframe to segregate by one column name values:

import pandas as pd

cols = ['Quantity', 'Code', 'Value']
data = [[1757,     '08951201',     717.0],
 [1100,     '08A85800',       0.0],
 [2500,     '08A85800',       0.0],
 [323,    '08951201',      0.0],
 [800,    '08A85800',       0.0]]

df = pd.DataFrame(data, columns=cols)

groups =df.groupby(['Code'])

Then you can recover indices by groups.indices , this will return a dict with 'Code' values as keys, and index as values. For last if you want to get every sub-dataframe you can call group_list = list(groups) . I suggest to do the work in 2 steps (first group by, then call list), because this way you can call other methods over the groupDataframe ( group )


EDIT

Then if you want a particular dataframe you could call

 df_i = group_list[i][1]

group_list[i] is the i-th element of sub-dataframe, but it's a tupple containing (group_val,group_df) . where group_val is the value associated to this new dataframe ( '08951201' or '08A85800' ) and group_df is the new dataframe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM