[英]convert rows to columns in pandas dataframe
I'd like to convert to below Df1
to Df2
. 我想转换到低于
Df1
到Df2
。
The empty values would be filled with Nan
. 空值将用
Nan
填充。
Below Dfs are examples. Dfs下面是例子。
IDs are 100,000. ID是100,000。
Only week 8 has all IDs, so total rows will be 100,000. 只有第8周有所有ID,所以总行数为100,000。
I have Df3 which has 100,000 of id, and I want to merge df1 on Df3 formatted as df2. 我有Df3,其中有100,000个id,我想将Df3上的df1合并为df2格式。
ex) pd.merge(df3, df1, on="id", how="left")
-> but, formatted as df2
ex)
pd.merge(df3, df1, on="id", how="left")
- >但格式化为df2
Df1>
wk, id, col1, col2 ...
1 1 0.5 15
2 2 0.5 15
3 3 0.5 15
1 2 0.5 15
3 2 0.5 15
------
Df2>
wk1, id, col1, col2, wk2, id, col1, col2, wk3, id, col1, col2,...
1 1 0.5 15 2 1 Nan Nan 3 1 Nan Nan
1 2 0.5 15 2 2 0.5 15 3 2 0.5 15
1 3 Nan Nan 2 3 Nan Nan 3 3 0.5 15
Use: 使用:
#create dictionary for rename columns for correct sorting
d = dict(enumerate(df.columns))
d1 = {v:k for k, v in d.items()}
#first add missing values for each `wk` and `id`
df1 = df.set_index(['wk', 'id']).unstack().stack(dropna=False).reset_index()
#for each id create DataFrame, reshape by unstask and rename columns
df1 = (df1.groupby('id')
.apply(lambda x: pd.DataFrame(x.values, columns=df.columns))
.unstack()
.reset_index(drop=True)
.rename(columns=d1, level=0)
.sort_index(axis=1, level=1)
.rename(columns=d, level=0))
#convert values to integers if necessary
df1.loc[:, ['wk', 'id']] = df1.loc[:, ['wk', 'id']].astype(int)
#flatten MultiIndex in columns
df1.columns = ['{}_{}'.format(a, b) for a, b in df1.columns]
print (df1)
wk_0 id_0 col1_0 col2_0 wk_1 id_1 col1_1 col2_1 wk_2 id_2 col1_2 \
0 1 1 0.5 15.0 2 1 NaN NaN 3 1 NaN
1 1 2 0.5 15.0 2 2 0.5 15.0 3 2 0.5
2 1 3 NaN NaN 2 3 NaN NaN 3 3 0.5
col2_2
0 NaN
1 15.0
2 15.0
You can use GroupBy
+ concat
. 您可以使用
GroupBy
+ concat
。 The idea is to create a list of dataframes with appropriately named columns and appropriate index. 我们的想法是创建一个具有适当命名列和适当索引的数据帧列表。 The concatenate along
axis=1
: 沿
axis=1
连接axis=1
:
d = {k: v.reset_index(drop=True) for k, v in df.groupby('wk')}
def formatter(df, key):
return df.rename(columns={'w': f'wk{key}'}).set_index('id')
L = [formatter(df, key) for key, df in d.items()]
res = pd.concat(L, axis=1).reset_index()
print(res)
id wk col1 col2 wk col1 col2 wk col1 col2
0 1 1.0 0.5 15.0 NaN NaN NaN NaN NaN NaN
1 2 1.0 0.5 15.0 2.0 0.5 15.0 3.0 0.5 15.0
2 3 NaN NaN NaN NaN NaN NaN 3.0 0.5 15.0
Note NaN
forces your series to become float
. 注意
NaN
强制您的系列变为float
。 There's no "good" fix for this. 对此没有“好”的解决方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.