简体   繁体   English

如何根据列值和行值重构 dataframe?

[英]How to reframe the dataframe based on column and row values?

I have a dataframe as follows:我有一个 dataframe 如下:

data = {
    'Title': ['001C', '001C', '004C', '001C', '004C', '004C', '007C', '010C'],
    'Items': ['A', 'B', 'D', 'A', 'A', 'K', 'L', 'M']
}
df = pd.DataFrame(data)

df去向

    Title   Items
0   001C    A
1   001C    B
2   004C    D
3   001C    A
4   004C    A
5   004C    K
6   007C    L
7   010C    M

I want to get the Items under each Title without any redundancy.我想在没有任何冗余的情况下获得每个Title下的Items The expected output is预期的 output 是

    001C    004C    007C    010C
0   A       D       L       M
1   B       A                
2           K                       

You can drop_duplicates , assign a helper column with increasing indexes per Item per group, and pivot :您可以drop_duplicatesassign一个辅助列,每组每个项目的索引增加,以及pivot

(df.drop_duplicates(subset=['Title', 'Items'])
   .assign(index=df.groupby('Title').cumcount())
   .pivot(index='index', columns='Title', values='Items')
   .rename_axis(index=None, columns=None)
  #.fillna('') # uncomment if you want empty strings in place of NaNs 
)

output: output:

      001C 004C 007C 010C                 
0        A    D    L    M
1        B    A  NaN  NaN
2      NaN    K  NaN  NaN

You can also use .drop_duplicates() + .pivot() .您还可以使用.drop_duplicates() + .pivot() Then, relocate the non-NaN values of each column to the top by .dropna() , as follows:然后,通过.dropna()将每列的非 NaN 值重新定位到顶部,如下所示:

(df.drop_duplicates()
   .pivot(columns='Title', values='Items')
   .apply(lambda x: pd.Series(x.dropna().values))
   .rename_axis(columns=None)
)

Result:结果:

  001C 004C 007C 010C
0    A    D    L    M
1    B    A  NaN  NaN
2  NaN    K  NaN  NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM