Imagine a df like this:
timestamp | data_point_1 | data_point_2 | some_data |
---|---|---|---|
2021/06/24 | a | b | 2 |
2021/06/24 | c | d | 3 |
2021/06/25 | c | d | 3 |
I want to change it to a df like this, that has tuples of unique value pairs of column data_point1
and data_point2
and only have the some_data
column value for each timestamp
:
timestamp | (a,b) | (c,d) |
---|---|---|
2021/06/24 | 2 | 3 |
2021/06/25 | NaN | 3 |
Here's the example data snippet:
import pandas as pd
test = pd.DataFrame({'timestamp': ["2021/06/24", "2021/06/24", "2021/06/25"], 'data_point_1': ["a", "c", "c"], 'data_point_2': ["b", "d", "d"], 'some_data': [2, 3, 3]})
print(test)
# timestamp data_point_1 data_point_2 some_data
# 0 2021/06/24 a b 2
# 1 2021/06/24 c d 3
# 2 2021/06/25 c d 3
# desired:
# timestamp (a,b) (c,d)
# 0 2021/06/24 2 3
# 1 2021/06/25 0 3
Thanks :)
Use DataFrame.pivot
with convert MultiIndex
values to tuples:
df = test.pivot('timestamp', ['data_point_1','data_point_2'], 'some_data')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()
print (df)
timestamp (a, b) (c, d)
0 2021/06/24 2.0 3.0
1 2021/06/25 NaN 3.0
If need aggregate values, it means there are duplicates per timestamp, data_point_1, data_point_2
use DataFrame.pivot_table
with some aggregate function like mean
:
#if need aggregate values
#df = test.pivot_table(index='timestamp',
columns=['data_point_1','data_point_2'],
values='some_data',
aggfunc='mean')
df.columns = [tuple(x) for x in df.columns]
df = df.reset_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.