简体   繁体   English

Pandas:从唯一的行值对中创建带有元组作为标签的列

[英]Pandas: create columns with tuples as labels from unique pairs of row values

Imagine a df like this:想象一个这样的 df:

timestamp时间戳 data_point_1数据点_1 data_point_2数据点_2 some_data一些数据
2021/06/24 2021/06/24 a一种 b 2 2
2021/06/24 2021/06/24 c C d d 3 3
2021/06/25 2021/06/25 c C d d 3 3

I want to change it to a df like this, that has tuples of unique value pairs of column data_point1 and data_point2 and only have the some_data column value for each timestamp :我想将它更改为这样的 df,它具有列data_point1data_point2的唯一值对的元组,并且每个timestamp只有some_data列值:

timestamp时间戳 (a,b) (a,b) (c,d) (光盘)
2021/06/24 2021/06/24 2 2 3 3
2021/06/25 2021/06/25 NaN NaN 3 3

Here's the example data snippet:这是示例数据片段:

import pandas as pd

test = pd.DataFrame({'timestamp': ["2021/06/24", "2021/06/24", "2021/06/25"], 'data_point_1': ["a", "c", "c"], 'data_point_2': ["b", "d", "d"], 'some_data': [2, 3, 3]})

print(test)
#    timestamp data_point_1 data_point_2  some_data
# 0  2021/06/24            a            b          2
# 1  2021/06/24            c            d          3
# 2  2021/06/25            c            d          3

# desired:
#    timestamp   (a,b)       (c,d)
# 0  2021/06/24    2           3
# 1  2021/06/25    0           3

Thanks :)谢谢 :)

Use DataFrame.pivot with convert MultiIndex values to tuples:使用DataFrame.pivotMultiIndex值转换为元组:

df = test.pivot('timestamp', ['data_point_1','data_point_2'], 'some_data')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()
print (df)
    timestamp  (a, b)  (c, d)
0  2021/06/24     2.0     3.0
1  2021/06/25     NaN     3.0

If need aggregate values, it means there are duplicates per timestamp, data_point_1, data_point_2 use DataFrame.pivot_table with some aggregate function like mean :如果需要聚合值,这意味着每个timestamp, data_point_1, data_point_2都有重复timestamp, data_point_1, data_point_2使用DataFrame.pivot_table和一些聚合函数,如mean

#if need aggregate values
#df = test.pivot_table(index='timestamp', 
                       columns=['data_point_1','data_point_2'], 
                       values='some_data', 
                       aggfunc='mean')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM