I want to combine rows into single row with condition. (Language: Python, Data-frame: Pandas) For example:
Current data:
0 1 2 3 4
0 data1
1 string1 num1 ex1 bla1
2 string2
3 string3
4 data2
5 string4 num2 ex2 bla2
6 string5
Result:
0 1 2 3 4
0 data1 string1 string2 string3 num1 ex1 bla1
0 data2 string4 string5 num2 ex2 bla2
But I can not find logic to this problem. Any idea?
Use:
g = df['0'].ffill()
d = df.groupby(g, sort=False).first()
d['1'] = df['1'].dropna().groupby(g).agg(' '.join)
d = d.reset_index(drop=True)
Details:
Create a grouper g
using Series.ffill
on df['0']
:
print(g)
0 data1
1 data1
2 data1
3 data1
4 data2
5 data2
6 data2
Name: 0, dtype: object
Use DataFrame.groupby
to group the dataframe on grouper g
and aggregate using first
, then use Series.dropna
on column 1
and use Series.groupby
to group the column 1
on g
and aggregate using join
, finally use reset_index
:
print(d)
0 1 2 3 4
0 data1 string1 string2 string3 num1 ex1 bla1
1 data2 string4 string5 num2 ex2 bla2
You have a choice of how to handle the other axes (other than the one being concatenated). This can be done in the following two ways: *Take the union of them all, join='outer'. This is the default option as it results in zero information loss. *Take the intersection, join='inner'.
Here is an example of each of these methods. First, the default joins ='outer' behaviour:
In [8]: df4 = pd.DataFrame({'B': ['B2', 'B3', 'B6', 'B7'],
...: 'D': ['D2', 'D3', 'D6', 'D7'],
...: 'F': ['F2', 'F3', 'F6', 'F7']},
...: index=[2, 3, 6, 7])
...:
In [9]: result = pd.concat([df1, df4], axis=1, sort=False)
Here is the same thing with join='inner':
In [10]: result = pd.concat([df1, df4], axis=1, join='inner')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.