繁体   English   中英

Python Pandas Pivot 两列(ColumnName和Value)

[英]Python Pandas Pivot Of Two columns (ColumnName and Value)

我有一个熊猫 dataframe 包含两列以及一个默认索引。 第一列是预期的“列名”,第二列是该列所需的值。

    name            returnattribute
0   Customer Name   Customer One Name
1   Customer Code   CGLOSPA
2   Customer Name   Customer Two Name
3   Customer Code   COTHABA
4   Customer Name   Customer Three Name
5   Customer Code   CGLOADS
6   Customer Name   Customer Four Name
7   Customer Code   CAPRCANBRA
8   Customer Name   Customer Five Name
9   Customer Code   COTHAMO

我想对此进行修改,以便我有 5 行两列(“客户名称”和“客户代码”)而不是 10 行。 期望的结果如下:

    Customer Code   Customer Name
0   CGLOSPA         Customer One Name
1   COTHABA         Customer Two Name
2   CGLOADS         Customer Three Name
3   CAPRCANBRA      Customer Four Name
4   COTHAMO         Customer Five Name

我尝试使用 pandas pivot function:

df.pivot(columns='name', values='returnattribute')

但这会导致十行仍然有备用空白:

    Customer Code   Customer Name
0   NaN             Customer One Name
1   CGLOSPA         NaN
2   NaN             Customer Two Name
3   COTHABA         NaN
4   NaN             Customer Three Name
5   CGLOADS         NaN
6   NaN             Customer Four Name
7   CAPRCANBRA      NaN
8   NaN             Customer Five Name
9   COTHAMO         NaN

如何我 pivot dataframe 只得到 5 行两列?

df.pivot中,未传递index参数时默认使用df.index 因此,output。

index : str 或 object 或 str 列表,可选

  • 用于制作新框架索引的列。 如果None ,使用现有索引。

获得所需的 output。 您必须创建一个新的索引列,如下所示。

df.assign(idx=df.index//2).pivot(index='idx', columns='name', values='returnattribute')

# name Customer Code        Customer Name
# idx                                    
# 0          CGLOSPA    Customer One Name
# 1          COTHABA    Customer Two Name
# 2          CGLOADS  Customer Three Name
# 3       CAPRCANBRA   Customer Four Name
# 4          COTHAMO   Customer Five Name

因为每两行代表一个数据点。 你可以使用`reshape. 现在,构建所需的 dataframe。

reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
#        ['Customer Two Name', 'COTHABA'],
#        ['Customer Three Name', 'CGLOADS'],
#        ['Customer Four Name', 'CAPRCANBRA'],
#        ['Customer Five Name', 'COTHAMO']], dtype=object)

col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)

out = pd.DataFrame(reshaped, columns=col_names)

#          Customer Name Customer Code
# 0    Customer One Name       CGLOSPA
# 1    Customer Two Name       COTHABA
# 2  Customer Three Name       CGLOADS
# 3   Customer Four Name    CAPRCANBRA
# 4   Customer Five Name       COTHAMO

# we can reorder the columns using reindex.

您也可以直接将新索引传递给pivot_table ,使用aggfunc='first'因为您有非数字数据:

df.pivot_table(index=df.index//2, columns='name',
               values='returnattribute', aggfunc='first')

output:

name Customer Code        Customer Name
0          CGLOSPA    Customer One Name
1          COTHABA    Customer Two Name
2          CGLOADS  Customer Three Name
3       CAPRCANBRA   Customer Four Name
4          COTHAMO   Customer Five Name

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM