Is there any method to combine multiple rows into one by condition using pandas (not groupby)?

Question

I would like to combine a DataFrames with same column different row into one based on another column in my Excel. Is there any way to make it possible without using groupby as my data later will need to iterrows.

Example is as below:

df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
                    'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})

print (df)

**OUTPUT:**

        date        des
0   16042020          I
1                    am
2                 happy
3   17042020          I
4                    am
5                     a
6             butterfly
7   17042020          I
8                    am
9                     a
10                 girl

Expected Output are as below (based on date):

        date        des            Result
0   16042020          I        I am happy
1                    am                  
2                 happy                  
3   17042020          I  I am a butterfly
4                    am                  
5                     a                  
6             butterfly                  
7   17042020          I       I am a girl
8                    am                  
9                     a                  
10                 girl

Answer 1

A rather ugly, brute force and perhaps easy to follow solution would be:

df['result'] = ""

rows = df.shape[0]
i = 0
while i < rows:
    if df.iloc[i, 0] != "":
        msg = df.iloc[i, 1]
        j = i + 1
        while (j < rows) and (df.iloc[j, 0] == ""):
            msg = msg + " " + df.iloc[j, 1]
            j +=1
        df.iloc[i, 2] = msg
        i = j

Don't see any way of doing without looping (implicit or explicit).

Answer 2

How about this, using ' '.join() , loc and zip :

indexes=df[df.date!=''].index.to_list()+[len(df)]
for i,i1 in zip(indexes,indexes[1:]):
    df.loc[i,'result']=' '.join(df.loc[i:i1-1,'des'])  #same as df.loc[i:i1-1,'des'].str.cat(sep=' ')
df=df.fillna('')

Output:

df
        date        des            result
0   16042020          I        I am happy
1                    am                  
2                 happy                  
3   17042020          I  I am a butterfly
4                    am                  
5                     a                  
6             butterfly                  
7   17042020          I       I am a girl
8                    am                  
9                     a                  
10                 girl

Answer 3

if the problem is only with using iterrows() after a groupby you can do a reset_index() which returns a DataFrame from a GroupBy object

or you can use lambda instead

df['result'] = df.apply(lambda x: " ".join(df[df['date'] == x['date']]['des'].tolist(), axis = 1)

Answer 4

This is more complicated because the data structure has no explicit breaks. There are repeating dates that act as a break between concatenations.

synthesise foo as internal column that changes when value is seen in date column
concatenate strings based on a break being observed. Push back into original data frame using transform()
finally cleanup result setting it to empty if there's no value in date column

import numpy as np
df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
                    'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})

df["result"] = df.assign(foo=df.reset_index()\
          .apply(lambda r: r["index"] if str(r["date"]).strip()!="" else np.NaN, axis=1).fillna(method="ffill"))\
          .groupby("foo").transform(lambda x: " ".join(x))
df.loc[df["date"].str.strip()=="", "result"] = ""
df

output

     date        des            result
 16042020          I        I am happy
                  am                  
               happy                  
 17042020          I  I am a butterfly
                  am                  
                   a                  
           butterfly                  
 17042020          I       I am a girl
                  am                  
                   a                  
                girl

Is there any method to combine multiple rows into one by condition using pandas (not groupby)?

Question

4 answers

solution1
1 ACCPTED 2020-07-20 09:58:48

solution2
1 2020-07-20 11:57:21

solution3
0 2020-07-20 09:43:42

solution4
0 2020-07-20 10:53:03

Is there any method to combine multiple rows into one by condition using pandas (not groupby)?

Question

4 answers

solution1 1 ACCPTED 2020-07-20 09:58:48

solution2 1 2020-07-20 11:57:21

solution3 0 2020-07-20 09:43:42

solution4 0 2020-07-20 10:53:03

solution1
1 ACCPTED 2020-07-20 09:58:48

solution2
1 2020-07-20 11:57:21

solution3
0 2020-07-20 09:43:42

solution4
0 2020-07-20 10:53:03