简体   繁体   中英

Is there any method to combine multiple rows into one by condition using pandas (not groupby)?

I would like to combine a DataFrames with same column different row into one based on another column in my Excel. Is there any way to make it possible without using groupby as my data later will need to iterrows.

Example is as below:

df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
                    'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})

print (df)

**OUTPUT:**

        date        des
0   16042020          I
1                    am
2                 happy
3   17042020          I
4                    am
5                     a
6             butterfly
7   17042020          I
8                    am
9                     a
10                 girl

Expected Output are as below (based on date):

        date        des            Result
0   16042020          I        I am happy
1                    am                  
2                 happy                  
3   17042020          I  I am a butterfly
4                    am                  
5                     a                  
6             butterfly                  
7   17042020          I       I am a girl
8                    am                  
9                     a                  
10                 girl                 

A rather ugly, brute force and perhaps easy to follow solution would be:

df['result'] = ""

rows = df.shape[0]
i = 0
while i < rows:
    if df.iloc[i, 0] != "":
        msg = df.iloc[i, 1]
        j = i + 1
        while (j < rows) and (df.iloc[j, 0] == ""):
            msg = msg + " " + df.iloc[j, 1]
            j +=1
        df.iloc[i, 2] = msg
        i = j

Don't see any way of doing without looping (implicit or explicit).

How about this, using ' '.join() , loc and zip :

indexes=df[df.date!=''].index.to_list()+[len(df)]
for i,i1 in zip(indexes,indexes[1:]):
    df.loc[i,'result']=' '.join(df.loc[i:i1-1,'des'])  #same as df.loc[i:i1-1,'des'].str.cat(sep=' ')
df=df.fillna('')

Output:

df
        date        des            result
0   16042020          I        I am happy
1                    am                  
2                 happy                  
3   17042020          I  I am a butterfly
4                    am                  
5                     a                  
6             butterfly                  
7   17042020          I       I am a girl
8                    am                  
9                     a                  
10                 girl                  

if the problem is only with using iterrows() after a groupby you can do a reset_index() which returns a DataFrame from a GroupBy object

or you can use lambda instead

df['result'] = df.apply(lambda x: " ".join(df[df['date'] == x['date']]['des'].tolist(), axis = 1)

This is more complicated because the data structure has no explicit breaks. There are repeating dates that act as a break between concatenations.

  1. synthesise foo as internal column that changes when value is seen in date column
  2. concatenate strings based on a break being observed. Push back into original data frame using transform()
  3. finally cleanup result setting it to empty if there's no value in date column
import numpy as np
df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
                    'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})

df["result"] = df.assign(foo=df.reset_index()\
          .apply(lambda r: r["index"] if str(r["date"]).strip()!="" else np.NaN, axis=1).fillna(method="ffill"))\
          .groupby("foo").transform(lambda x: " ".join(x))
df.loc[df["date"].str.strip()=="", "result"] = ""
df

output

     date        des            result
 16042020          I        I am happy
                  am                  
               happy                  
 17042020          I  I am a butterfly
                  am                  
                   a                  
           butterfly                  
 17042020          I       I am a girl
                  am                  
                   a                  
                girl                  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM