简体   繁体   中英

Pandas combine multiple rows into one row with condition

I want to combine rows into single row with condition. (Language: Python, Data-frame: Pandas) For example:

Current data:

        0      1        2     3    4
     0  data1
     1         string1  num1  ex1  bla1    
     2         string2
     3         string3
     4  data2  
     5         string4  num2  ex2  bla2
     6         string5

Result:

        0      1                        2     3    4 
     0  data1  string1 string2 string3  num1  ex1  bla1
     0  data2  string4 string5          num2  ex2  bla2

But I can not find logic to this problem. Any idea?

当前数据

结果

Use:

g = df['0'].ffill()

d = df.groupby(g, sort=False).first()
d['1'] = df['1'].dropna().groupby(g).agg(' '.join)
d = d.reset_index(drop=True)

Details:

Create a grouper g using Series.ffill on df['0'] :

print(g)
0     data1
1     data1
2     data1
3     data1
4     data2
5     data2
6     data2
Name: 0, dtype: object

Use DataFrame.groupby to group the dataframe on grouper g and aggregate using first , then use Series.dropna on column 1 and use Series.groupby to group the column 1 on g and aggregate using join , finally use reset_index :

print(d)
        0                        1     2    3         4
0   data1  string1 string2 string3  num1  ex1      bla1    
1   data2          string4 string5  num2  ex2      bla2

You have a choice of how to handle the other axes (other than the one being concatenated). This can be done in the following two ways: *Take the union of them all, join='outer'. This is the default option as it results in zero information loss. *Take the intersection, join='inner'.

Here is an example of each of these methods. First, the default joins ='outer' behaviour:

In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
   ...:                     'D': ['D2', 'D3', 'D6', 'D7'],
   ...:                     'F': ['F2', 'F3', 'F6', 'F7']},
   ...:                    index=[2, 3, 6, 7])
   ...: 

In [9]: result = pd.concat([df1, df4], axis=1, sort=False)

Enter image description here

Here is the same thing with join='inner':

 In [10]: result = pd.concat([df1, df4], axis=1, join='inner')

Enter image description here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM