简体   繁体   中英

How to split a DataFrame on each different value in a column?

Below is an example DataFrame.

      0      1     2     3          4
0   0.0  13.00  4.50  30.0   0.0,13.0
1   0.0  13.00  4.75  30.0   0.0,13.0
2   0.0  13.00  5.00  30.0   0.0,13.0
3   0.0  13.00  5.25  30.0   0.0,13.0
4   0.0  13.00  5.50  30.0   0.0,13.0
5   0.0  13.00  5.75   0.0   0.0,13.0
6   0.0  13.00  6.00  30.0   0.0,13.0
7   1.0  13.25  0.00  30.0  0.0,13.25
8   1.0  13.25  0.25   0.0  0.0,13.25
9   1.0  13.25  0.50  30.0  0.0,13.25
10  1.0  13.25  0.75  30.0  0.0,13.25
11  2.0  13.25  1.00  30.0  0.0,13.25
12  2.0  13.25  1.25  30.0  0.0,13.25
13  2.0  13.25  1.50  30.0  0.0,13.25
14  2.0  13.25  1.75  30.0  0.0,13.25
15  2.0  13.25  2.00  30.0  0.0,13.25
16  2.0  13.25  2.25  30.0  0.0,13.25

I want to split this into new dataframes when the row in column 0 changes.

      0      1     2     3          4
0   0.0  13.00  4.50  30.0   0.0,13.0
1   0.0  13.00  4.75  30.0   0.0,13.0
2   0.0  13.00  5.00  30.0   0.0,13.0
3   0.0  13.00  5.25  30.0   0.0,13.0
4   0.0  13.00  5.50  30.0   0.0,13.0
5   0.0  13.00  5.75   0.0   0.0,13.0
6   0.0  13.00  6.00  30.0   0.0,13.0

7   1.0  13.25  0.00  30.0  0.0,13.25
8   1.0  13.25  0.25   0.0  0.0,13.25
9   1.0  13.25  0.50  30.0  0.0,13.25
10  1.0  13.25  0.75  30.0  0.0,13.25

11  2.0  13.25  1.00  30.0  0.0,13.25
12  2.0  13.25  1.25  30.0  0.0,13.25
13  2.0  13.25  1.50  30.0  0.0,13.25
14  2.0  13.25  1.75  30.0  0.0,13.25
15  2.0  13.25  2.00  30.0  0.0,13.25
16  2.0  13.25  2.25  30.0  0.0,13.25

I've tried adapting the following solutions without any luck so far. Split array at value in numpy Split a large pandas dataframe

Looks like you want to groupby the first colum. You could create a dictionary from the groupby object, and have the groupby keys be the dictionary keys:

out = dict(tuple(df.groupby(0)))

Or we could also build a list from the groupby object. This becomes more useful when we only want positional indexing rather than based on the grouping key:

out = [sub_df for _, sub_df in df.groupby(0)]

We could then index the dict based on the grouping key , or the list based on the group's position:

print(out[0])

    0     1     2     3         4
0  0.0  13.0  4.50  30.0  0.0,13.0
1  0.0  13.0  4.75  30.0  0.0,13.0
2  0.0  13.0  5.00  30.0  0.0,13.0
3  0.0  13.0  5.25  30.0  0.0,13.0
4  0.0  13.0  5.50  30.0  0.0,13.0
5  0.0  13.0  5.75   0.0  0.0,13.0
6  0.0  13.0  6.00  30.0  0.0,13.0

Based on

I want to split this into new dataframes when the row in column 0 changes.

If you only want to group when value in column 0 changes , You can try:

d=dict([*df.groupby(df['0'].ne(df['0'].shift()).cumsum())])

print(d[1])
print(d[2])

     0     1     2     3         4
0  0.0  13.0  4.50  30.0  0.0,13.0
1  0.0  13.0  4.75  30.0  0.0,13.0
2  0.0  13.0  5.00  30.0  0.0,13.0
3  0.0  13.0  5.25  30.0  0.0,13.0
4  0.0  13.0  5.50  30.0  0.0,13.0
5  0.0  13.0  5.75   0.0  0.0,13.0
6  0.0  13.0  6.00  30.0  0.0,13.0
      0      1     2     3          4
7   1.0  13.25  0.00  30.0  0.0,13.25
8   1.0  13.25  0.25   0.0  0.0,13.25
9   1.0  13.25  0.50  30.0  0.0,13.25
10  1.0  13.25  0.75  30.0  0.0,13.25

I will use GroupBy.__iter__ :

d = dict(df.groupby(df['0'].diff().ne(0).cumsum()).__iter__())
#d = dict(df.groupby(df[0].diff().ne(0).cumsum()).__iter__())

Note that if there are repeated non-consecutive values ​​different groups will be created, if you only use groupby(0) they will be grouped in the same group

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM