简体   繁体   中英

Consolidation of consecutive rows by condition with Python Pandas

I try to handle the next data issue. I have a dataframe of values and their labels list (this is multi-class, so the labels are a list).

The dataframe looks like:

     | value| labels
---------------------
row_1| A    |[label1]
row_2| B    |[label2]
row_3| C    |[label3, label4]
row_4| D    |[label4, label5]

I want to find all rows that have a specific label and then:

  1. Firstly, concatenate it with the next row - the string will be concatenated before the next row's value.
  2. Secondly, the labels will be appended to the label list of the next row

For example, if I want to do that for label2 , the desired output will be:

     | value| labels
---------------------
row_1| A    |[label1]
row_3| BC   |[label2, label3, label4]
row_4| D    |[label4, label5]

The value "B" is joined before the next row's values, and the label "label2" will be appended to the beginning of the next row's label list. The indexes are not relevant for me.

I would greatly appreciate help with this. I tried to use, merge , join , shift , and cumsum but without success so far.


The following code creates the data in the example:

data = {'row_1': ["A", ["label1"]], 'row_2': ["B", ["label2"]],
        'row_3':["C", ["label3", "label4"]], 'row_4': ["D", ["label4", "label5"]]}
df = pd.DataFrame.from_dict(data, orient='index').rename(columns={0: "value", 1: "labels"})

You could create a grouping variable and use that to aggregate the columns

import pandas as pd
import numpy as np

def my_combine(data, value):
    index = data['labels'].apply(lambda x: np.isin(value, x))
    if(all(~index)):
        return data
    idx = (index | index.shift()).to_numpy()
    vals = (np.arange(idx.size) + 1) *(~idx)
    gr = np.r_[np.where(vals[1:] != vals[:-1])[0], vals.size - 1]
    groups = np.repeat(gr, np.diff(np.r_[-1, gr]) )
    return data.groupby(groups).agg(sum)
my_combine(df, 'label2')

  value                    labels
0     A                  [label1]
2    BC  [label2, label3, label4]
3     D          [label4, label5]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM