简体   繁体   English

将行转换为pandas dataframe中的列

[英]convert rows to columns in pandas dataframe

I'd like to convert to below Df1 to Df2 . 我想转换到低于Df1Df2

The empty values would be filled with Nan . 空值将用Nan填充。

Below Dfs are examples. Dfs下面是例子。

  • My data has weeks from 1 to 8. 我的数据有几周从1到8。
  • IDs are 100,000. ID是100,000。

  • Only week 8 has all IDs, so total rows will be 100,000. 只有第8周有所有ID,所以总行数为100,000。

I have Df3 which has 100,000 of id, and I want to merge df1 on Df3 formatted as df2. 我有Df3,其中有100,000个id,我想将Df3上的df1合并为df2格式。

ex) pd.merge(df3, df1, on="id", how="left") -> but, formatted as df2 ex) pd.merge(df3, df1, on="id", how="left") - >但格式化为df2

 Df1>
 wk, id, col1, col2  ...
 1    1   0.5  15  
 2    2   0.5  15  
 3    3   0.5  15  
 1    2   0.5  15  
 3    2   0.5  15  

 ------
 Df2>
 wk1, id, col1, col2, wk2, id, col1, col2, wk3,  id, col1, col2,...
 1    1   0.5  15      2    1   Nan   Nan   3    1   Nan   Nan
 1    2   0.5  15      2    2   0.5  15     3    2   0.5    15
 1    3   Nan  Nan     2    3   Nan   Nan   3    3   0.5    15

Use: 使用:

#create dictionary for rename columns for correct sorting
d = dict(enumerate(df.columns))
d1 = {v:k for k, v in d.items()}

#first add missing values for each `wk` and `id`
df1 = df.set_index(['wk', 'id']).unstack().stack(dropna=False).reset_index()

#for each id create DataFrame, reshape by unstask and rename columns
df1 = (df1.groupby('id')
       .apply(lambda x: pd.DataFrame(x.values, columns=df.columns))
       .unstack()
       .reset_index(drop=True)
       .rename(columns=d1, level=0)
       .sort_index(axis=1, level=1)
       .rename(columns=d, level=0))

#convert values to integers if necessary
df1.loc[:, ['wk', 'id']] = df1.loc[:, ['wk', 'id']].astype(int)

#flatten MultiIndex in columns
df1.columns = ['{}_{}'.format(a, b) for a, b in df1.columns]
print (df1)

   wk_0  id_0  col1_0  col2_0  wk_1  id_1  col1_1  col2_1  wk_2  id_2  col1_2  \
0     1     1     0.5    15.0     2     1     NaN     NaN     3     1     NaN   
1     1     2     0.5    15.0     2     2     0.5    15.0     3     2     0.5   
2     1     3     NaN     NaN     2     3     NaN     NaN     3     3     0.5   

   col2_2  
0     NaN  
1    15.0  
2    15.0  

You can use GroupBy + concat . 您可以使用GroupBy + concat The idea is to create a list of dataframes with appropriately named columns and appropriate index. 我们的想法是创建一个具有适当命名列和适当索引的数据帧列表。 The concatenate along axis=1 : 沿axis=1连接axis=1

d = {k: v.reset_index(drop=True) for k, v in df.groupby('wk')}

def formatter(df, key):
    return df.rename(columns={'w': f'wk{key}'}).set_index('id')

L = [formatter(df, key) for key, df in d.items()]
res = pd.concat(L, axis=1).reset_index()

print(res)

   id   wk  col1  col2   wk  col1  col2   wk  col1  col2
0   1  1.0   0.5  15.0  NaN   NaN   NaN  NaN   NaN   NaN
1   2  1.0   0.5  15.0  2.0   0.5  15.0  3.0   0.5  15.0
2   3  NaN   NaN   NaN  NaN   NaN   NaN  3.0   0.5  15.0

Note NaN forces your series to become float . 注意NaN强制您的系列变为float There's no "good" fix for this. 对此没有“好”的解决方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM