I would like to combine a DataFrames with same column different row into one based on another column in my Excel. Is there any way to make it possible without using groupby as my data later will need to iterrows.
Example is as below:
df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})
print (df)
**OUTPUT:**
date des
0 16042020 I
1 am
2 happy
3 17042020 I
4 am
5 a
6 butterfly
7 17042020 I
8 am
9 a
10 girl
Expected Output are as below (based on date):
date des Result
0 16042020 I I am happy
1 am
2 happy
3 17042020 I I am a butterfly
4 am
5 a
6 butterfly
7 17042020 I I am a girl
8 am
9 a
10 girl
A rather ugly, brute force and perhaps easy to follow solution would be:
df['result'] = ""
rows = df.shape[0]
i = 0
while i < rows:
if df.iloc[i, 0] != "":
msg = df.iloc[i, 1]
j = i + 1
while (j < rows) and (df.iloc[j, 0] == ""):
msg = msg + " " + df.iloc[j, 1]
j +=1
df.iloc[i, 2] = msg
i = j
Don't see any way of doing without looping (implicit or explicit).
How about this, using ' '.join()
, loc
and zip
:
indexes=df[df.date!=''].index.to_list()+[len(df)]
for i,i1 in zip(indexes,indexes[1:]):
df.loc[i,'result']=' '.join(df.loc[i:i1-1,'des']) #same as df.loc[i:i1-1,'des'].str.cat(sep=' ')
df=df.fillna('')
Output:
df
date des result
0 16042020 I I am happy
1 am
2 happy
3 17042020 I I am a butterfly
4 am
5 a
6 butterfly
7 17042020 I I am a girl
8 am
9 a
10 girl
if the problem is only with using iterrows()
after a groupby
you can do a reset_index()
which returns a DataFrame from a GroupBy object
or you can use lambda instead
df['result'] = df.apply(lambda x: " ".join(df[df['date'] == x['date']]['des'].tolist(), axis = 1)
This is more complicated because the data structure has no explicit breaks. There are repeating dates that act as a break between concatenations.
foo
as internal column that changes when value is seen in date columntransform()
result
setting it to empty if there's no value in date
columnimport numpy as np
df = pd.DataFrame({'date': [16042020, '', '', 17042020,'', '', '', 17042020,'', '', ''],
'des': ['I','am', 'happy','I','am', 'a','butterfly','I','am', 'a', 'girl']})
df["result"] = df.assign(foo=df.reset_index()\
.apply(lambda r: r["index"] if str(r["date"]).strip()!="" else np.NaN, axis=1).fillna(method="ffill"))\
.groupby("foo").transform(lambda x: " ".join(x))
df.loc[df["date"].str.strip()=="", "result"] = ""
df
output
date des result
16042020 I I am happy
am
happy
17042020 I I am a butterfly
am
a
butterfly
17042020 I I am a girl
am
a
girl
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.