Python Pandas Pivot 两列（ColumnName和Value）

Question

我有一个熊猫 dataframe 包含两列以及一个默认索引。 第一列是预期的“列名”，第二列是该列所需的值。

    name            returnattribute
0   Customer Name   Customer One Name
1   Customer Code   CGLOSPA
2   Customer Name   Customer Two Name
3   Customer Code   COTHABA
4   Customer Name   Customer Three Name
5   Customer Code   CGLOADS
6   Customer Name   Customer Four Name
7   Customer Code   CAPRCANBRA
8   Customer Name   Customer Five Name
9   Customer Code   COTHAMO

我想对此进行修改，以便我有 5 行两列（“客户名称”和“客户代码”）而不是 10 行。 期望的结果如下：

    Customer Code   Customer Name
0   CGLOSPA         Customer One Name
1   COTHABA         Customer Two Name
2   CGLOADS         Customer Three Name
3   CAPRCANBRA      Customer Four Name
4   COTHAMO         Customer Five Name

我尝试使用 pandas pivot function：

df.pivot(columns='name', values='returnattribute')

但这会导致十行仍然有备用空白：

    Customer Code   Customer Name
0   NaN             Customer One Name
1   CGLOSPA         NaN
2   NaN             Customer Two Name
3   COTHABA         NaN
4   NaN             Customer Three Name
5   CGLOADS         NaN
6   NaN             Customer Four Name
7   CAPRCANBRA      NaN
8   NaN             Customer Five Name
9   COTHAMO         NaN

如何我 pivot dataframe 只得到 5 行两列？

Answer 1

在df.pivot中，未传递index参数时默认使用df.index 。 因此，output。

从文档DataFrame.pivot ：

index : str 或 object 或 str 列表，可选

用于制作新框架索引的列。 如果None ，使用现有索引。

获得所需的 output。 您必须创建一个新的索引列，如下所示。

df.assign(idx=df.index//2).pivot(index='idx', columns='name', values='returnattribute')

# name Customer Code        Customer Name
# idx                                    
# 0          CGLOSPA    Customer One Name
# 1          COTHABA    Customer Two Name
# 2          CGLOADS  Customer Three Name
# 3       CAPRCANBRA   Customer Four Name
# 4          COTHAMO   Customer Five Name

因为每两行代表一个数据点。 你可以使用`reshape. 现在，构建所需的 dataframe。

reshaped = df['returnattribute'].to_numpy().reshape(-1, 2)
# array([['Customer One Name', 'CGLOSPA'],
#        ['Customer Two Name', 'COTHABA'],
#        ['Customer Three Name', 'CGLOADS'],
#        ['Customer Four Name', 'CAPRCANBRA'],
#        ['Customer Five Name', 'COTHAMO']], dtype=object)

col_names = pd.unique(df.name)
# array(['Customer Name', 'Customer Code'], dtype=object)

out = pd.DataFrame(reshaped, columns=col_names)

#          Customer Name Customer Code
# 0    Customer One Name       CGLOSPA
# 1    Customer Two Name       COTHABA
# 2  Customer Three Name       CGLOADS
# 3   Customer Four Name    CAPRCANBRA
# 4   Customer Five Name       COTHAMO

# we can reorder the columns using reindex.

Answer 2

您也可以直接将新索引传递给pivot_table ，使用aggfunc='first'因为您有非数字数据：

df.pivot_table(index=df.index//2, columns='name',
               values='returnattribute', aggfunc='first')

output：

name Customer Code        Customer Name
0          CGLOSPA    Customer One Name
1          COTHABA    Customer Two Name
2          CGLOADS  Customer Three Name
3       CAPRCANBRA   Customer Four Name
4          COTHAMO   Customer Five Name

Python Pandas Pivot 两列（ColumnName和Value）

问题描述

2 个解决方案

解决方案1
2 2022-01-08 11:25:56

解决方案2
2 2022-01-08 12:51:44

Python Pandas Pivot 两列（ColumnName和Value）

问题描述

2 个解决方案

解决方案1 2 2022-01-08 11:25:56

解决方案2 2 2022-01-08 12:51:44

解决方案1
2 2022-01-08 11:25:56

解决方案2
2 2022-01-08 12:51:44