简体   繁体   中英

Python dataframes: Merge values of columns according to a specific condition

Hi I have a dataframe question. Lets say I have a dataframe in format like this

label    value 
1        a
1        b 
2  
2 
1        c
1        d

So now I have two consecutive parts of label 1's. I want to have an output such as this:

output: [ab,cd] 

Which related values of label 1's of different areas are merged together. Thank you.

You can use itertools.groupby , which only groups alike adjacent items:

from itertools import groupby
from operator import itemgetter

zipper = zip(df['label'], df['value'])
grouper = groupby(list(zipper), key=itemgetter(0))
res = [''.join(map(itemgetter(1), j)) for i, j in grouper if i == 1]

['ab', 'cd']

You can try to sum the values in value based on the condition of two consecutive labels of '1s' by doing:

>> df['label'] = df['label'].astype(str)

>> res = df + df.shift(-1)

  label value
0    11    ab
1    12   NaN
2    22   NaN
3    21   NaN
4    11    cd
5   NaN   NaN

Then we just filter in res the rows where label matches '11' :

>> res[res['label'].eq('11')]['value'].values.tolist()

['ab', 'cd']

You can try of

-> grouping the dataframe by label sequence and add the grouped value

-> group the dataframe by label to get individual ids information as list

Considered Dataframe

    label   value
0   1   a
1   1   b
2   2   NaN
3   2   NaN
4   1   c
5   1   d
6   1   e
7   3   b
8   3   c

#grouping the dataframe by label sequence checking with the previous value
df['value1'] = df.groupby(df.label.diff(1).abs().cumsum().fillna(0)).transform(sum)['value']

0    4.0
1    4.0
2    3.0
3    3.0
4    2.0
5    2.0
6    2.0
7    0.0
8    0.0

#group the dataframe by label to get individual ids information as list
df.groupby(df.label).apply(lambda x: x['value1'].unique())

Out:

      label
1    [ab, cde]
2        [0.0]
3         [bc]
dtype: object

With Pandas you can filter your dataframe by label . Then use GroupBy with a grouper series constructed using cumsum :

grouper = df['label'].ne(df['label'].shift()).cumsum()

res = df.loc[df['label'] == 1]\
        .groupby(grouper)['value'].sum().tolist()

['ab', 'cd']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM