Pandas group in series

Question

Given

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})

I need to treat a and b as one group, c as second group, d and e as third group. How to get first element from every group?

pd.DataFrame({'group': [1, 2, 1,], 'value':['a','c','d']})

Answer 1

Try this:

df1 = df[df['group'].ne(df['group'].shift())]

Check this answer for more details

Answer 2

You haven't specified if the group column tells whether the values are considered to be in the same group. So I'm assumming it has no connection, and you specify your groups in the groups list:

groups = [['a', 'b'], ['c'], ['d', 'e']]

condlist = [df['value'].isin(group) for group in groups]
choicelist = list(range(len(groups)))
group_idx = np.select(condlist, choicelist)

df.groupby(group_idx).first()

Result:

   group value
0      1     a
1      2     c
2      1     d

Answer 3

You can create your groups and map them to a reduced output:

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})
groups = [['a', 'b'], ['c'], ['d', 'e']]
mappings = {k: i for i, gr in enumerate(groups) for k in gr}

print(
    df.groupby(df['value'].map(mappings)).first()
)
       group value
value             
0          1     a
1          2     c
2          1     d

Pandas group in series

Question

3 answers

solution1
0 2022-09-01 13:00:45

solution2
0 2022-09-01 13:09:36

solution3
0 2022-09-01 13:42:15

Pandas group in series

Question

3 answers

solution1 0 2022-09-01 13:00:45

solution2 0 2022-09-01 13:09:36

solution3 0 2022-09-01 13:42:15

solution1
0 2022-09-01 13:00:45

solution2
0 2022-09-01 13:09:36

solution3
0 2022-09-01 13:42:15