Pandas：从唯一的行值对中创建带有元组作为标签的列

Question

Imagine a df like this:想象一个这样的 df：

timestamp时间戳	data_point_1数据点_1	data_point_2数据点_2	some_data一些数据
2021/06/24 2021/06/24	a一种	b乙	2 2
2021/06/24 2021/06/24	c C	d d	3 3
2021/06/25 2021/06/25	c C	d d	3 3

I want to change it to a df like this, that has tuples of unique value pairs of column data_point1 and data_point2 and only have the some_data column value for each timestamp :我想将它更改为这样的 df，它具有列data_point1和data_point2的唯一值对的元组，并且每个timestamp只有some_data列值：

timestamp时间戳	(a,b) (a,b)	(c,d) （光盘）
2021/06/24 2021/06/24	2 2	3 3
2021/06/25 2021/06/25	NaN NaN	3 3

Here's the example data snippet:这是示例数据片段：

import pandas as pd

test = pd.DataFrame({'timestamp': ["2021/06/24", "2021/06/24", "2021/06/25"], 'data_point_1': ["a", "c", "c"], 'data_point_2': ["b", "d", "d"], 'some_data': [2, 3, 3]})

print(test)
#    timestamp data_point_1 data_point_2  some_data
# 0  2021/06/24            a            b          2
# 1  2021/06/24            c            d          3
# 2  2021/06/25            c            d          3

# desired:
#    timestamp   (a,b)       (c,d)
# 0  2021/06/24    2           3
# 1  2021/06/25    0           3

Thanks :)谢谢：）

Answer 1

Use DataFrame.pivot with convert MultiIndex values to tuples:使用DataFrame.pivot将MultiIndex值转换为元组：

df = test.pivot('timestamp', ['data_point_1','data_point_2'], 'some_data')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()
print (df)
    timestamp  (a, b)  (c, d)
0  2021/06/24     2.0     3.0
1  2021/06/25     NaN     3.0

If need aggregate values, it means there are duplicates per timestamp, data_point_1, data_point_2 use DataFrame.pivot_table with some aggregate function like mean :如果需要聚合值，这意味着每个timestamp, data_point_1, data_point_2都有重复timestamp, data_point_1, data_point_2使用DataFrame.pivot_table和一些聚合函数，如mean ：

#if need aggregate values
#df = test.pivot_table(index='timestamp', 
                       columns=['data_point_1','data_point_2'], 
                       values='some_data', 
                       aggfunc='mean')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()

Pandas：从唯一的行值对中创建带有元组作为标签的列

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-06-24 11:26:55

Pandas：从唯一的行值对中创建带有元组作为标签的列

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-06-24 11:26:55

解决方案1
2 已采纳 2021-06-24 11:26:55