简体   繁体   中英

Python pandas grouping a dataframe by the unique value of a column

I have a dataframe in this format

A        B
1990-02  1 
1990-03  1
1999-05  1
1992-08  2
1996-12  2
2020-01  2
1990-05  3
1995-08  3
1999-11  3
2021-12  3

How can i convert this dataframe into groups base on the unique values of Column B

So my results should be in this format

[[[1990-02, 1],[1990-03, 1],[1999-05, 1]],
 [[1992-08, 2],[1996-12, 2],[2020-01, 2]],
 [[1990-05, 3],[1995-08, 3],[1999-11, 3],[2021-12, 3]]
]

Here's one way to get an equivalent structure with arrays:

>>> df.groupby("B").apply(pd.DataFrame.to_numpy).values
[array([['1990-02', 1],
        ['1990-03', 1],
        ['1999-05', 1]], dtype=object)
 array([['1992-08', 2],
        ['1996-12', 2],
        ['2020-01', 2]], dtype=object)
 array([['1990-05', 3],
        ['1995-08', 3],
        ['1999-11', 3],
        ['2021-12', 3]], dtype=object)]

This should make the job

import pandas as pd

data = {"A": ["1990-02", "1990-03","1999-05","1992-08","1996-12",
              "2020-01","1990-05","1995-08","1999-11", "2021-12"],
        "B": [1,1,1,2,2,2,3,3,3,3]}

df = pd.DataFrame(data=data)
out = df.groupby("B")['A'].apply(list)

output = [[[date, b_value] for date in block] 
          for b_value, block in zip(out.index, out.values)]

print(output)

Here is one way to get exactly what you want:

df.assign(l=df.agg(list, axis=1)).groupby('B')['l'].agg(list).tolist()

output:

[[['1990-02', 1], ['1990-03', 1], ['1999-05', 1]],
 [['1992-08', 2], ['1996-12', 2], ['2020-01', 2]],
 [['1990-05', 3], ['1995-08', 3], ['1999-11', 3], ['2021-12', 3]]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM