I have the following dataframe:
df = pd.DataFrame([
[123, 'abc', '121'],
[124, 'abc', '121'],
[456, 'def', '121'],
[123, 'abc', '122'],
[123, 'abc', '122'],
[456, 'def', '145'],
[456, 'def', '145'],
[456, 'def', '146'],
], columns=['userid', 'name', 'dt'])
I have grouped it according to the date: df2 = df.groupby('dt').apply(lambda df: df.reset_index(drop=True))
Now, the dataframe looks like this:
Now, I want to pivot the above such that they are in this format: userid name_1, name_2, ..., name_k
for each group such that the end df looks something like this:
userid name
123 abc
124 abc
456 def
123 abc, abc
You can use cumcount
with pivot_table
, where parameter index use columns userid
and dt
, so it looks like create df2
is not necessary:
df['cols'] = 'name_' + (df.groupby(['userid','dt']).cumcount() + 1).astype(str)
print (df.pivot_table(index=['userid', 'dt'],columns='cols', values='name', aggfunc=''.join))
cols name_1 name_2
userid dt
123 121 abc None
122 abc abc
124 121 abc None
456 121 def None
145 def def
146 def None
Check out groupby
and apply
. Their respective docs are here and here . You can unstack
( docs ) the extra level of the MultiIndex that is created.
df = df.set_index(['userid', 'dt'])['name']
df = df.groupby(level=[0,1]).apply(
lambda st: pd.Series(st.values, index=['name_%i'%i for i in range(st.shape[0])])
).unstack(level=-1)
print(df)
outputs
name_0 name_1
userid dt
123 121 abc None
122 abc abc
124 121 abc None
456 121 def None
145 def def
146 def None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.