简体   繁体   中英

Pandas group in series

Given

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})

I need to treat a and b as one group, c as second group, d and e as third group. How to get first element from every group?

pd.DataFrame({'group': [1, 2, 1,], 'value':['a','c','d']})

Try this:

df1 = df[df['group'].ne(df['group'].shift())]

Check this answer for more details

You haven't specified if the group column tells whether the values are considered to be in the same group. So I'm assumming it has no connection, and you specify your groups in the groups list:

groups = [['a', 'b'], ['c'], ['d', 'e']]

condlist = [df['value'].isin(group) for group in groups]
choicelist = list(range(len(groups)))
group_idx = np.select(condlist, choicelist)

df.groupby(group_idx).first()

Result:

   group value
0      1     a
1      2     c
2      1     d

You can create your groups and map them to a reduced output:

df = pd.DataFrame({'group': [1, 1, 2, 1, 1], 'value':['a','b','c','d','e']})
groups = [['a', 'b'], ['c'], ['d', 'e']]
mappings = {k: i for i, gr in enumerate(groups) for k in gr}

print(
    df.groupby(df['value'].map(mappings)).first()
)
       group value
value             
0          1     a
1          2     c
2          1     d

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM